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ABSTRACT  rd 

Statisticians  have  long  used  moving  average  type 
smoothing  and  classical  regression  analysis  techniques  to 
reduce  tte  variability  in  data  sets  and  enhance  the  visual 
information  presented  by  scatterplots.  This  thesis  examines 
the  effectiveness  of  Robust  Locally  Weighted  Regression 
Scatterplot  Smoothing  (LOWESS) ,  a  procedure  that  differs 
from  ether  teghnigues  because  it  smooths  all  of  the  feints 
and  works  on  unequally  as  well  as  equally  spaced  data.  The 
LOWESS  procedure  is  evaluated  by  comparing  it  to  previously 
validated  uniform  and  cosine  weighted  moving  average  and 
least  sguares  regression  programs.  Interactive  APL  and 
FORTRAN  programs  and  detailed  user  instructions  are  included 
for  use  by  interested  readers. 
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I.  INT BO DUCT ION 

A.   BACKGBOUND 

The  two  dimensional  scatter  plot  has  been  hailed  by  many 
statisticians  as  being  the  single  most  powerful  tool  used  in 
exploratory  data  analysis,  [Kef.  1  ].  A  scatter  plot  pres- 
ents an  entire  data  set  in  a  compact,  unambiguous  and  easily 
understandable  format,  in  which  either: 

1.  the  points  lie  in  a  nearly  straight  line; 

2.  the  points  almost  lie  on  a  smooth  curve; 

3.  the  points  are  scattered  without  any  apparent  corre- 
lation between  the  X  variables  and  the  Y  variables; 

4.  the  points  lie  somewhere  between  (1)  or  (2)  and  (3); 

5.  mcst  of  the  points  lie  near  a  straight  line  or  smooth 
curve  but  a  few  outliers  are  separated  from  the  rest. 
[Eef.  2] 

These  patterns  or  other  hidden  peculiarities  are  much  easier 
to  discover  during  a  brief  glimpse  at  a  well  prepared 
scatter  plot  than  during  an  examination  of  a  data  table,  for 
example,  the  strong  positive  correlation  between  total  users 
and  active  users  logged  on  to  the  W.R.  Church  computer 
system,  Figure  1.1,  is  more  easily  discerned  from  the 
plotted  points  than  from  the  tabulated  data1.  This  is  a 
good  example  of  case  (1),  described  above. 

Not  only  does  this  plot  point  out  the  positive  trend  in 
the  data,  it  also  demonstrates  that  it  is  nearly  linear  an  3 
provides  a  rough  estimate  of  the  relationship  between  the 
variahles. 


1  The  tatle  in  Figure  1.1  contains  only  a  small  portion  of 
the  472  data  points  included  in  the  plot.  A  complete  listing 
of  the  data  set  takes  approximately  two  pages  or  text  and  is 
not  required  for  demonstration  purposes. 
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Figure  1.1    Comparison  of  Data  Presentation  Hethods. 

More  precise  mathematical  expressions  and  confirmatory 
procedures,  including  goodness  of  fit  measures,  can  be 
obtained  by  employing  classical  regression  analysis  tech- 
nigues,  a  logical  enhancement  of  simple  scatter  plots, 
Figure  1.2.  Numerical  quantifications  such  as  the  Pearson 
product  moment  correlation  also  provide  summaries  tut  can  be 
ambiguous  if  not  accompanied  by  other  information,  [Pef.  1  , 
P  77]. 

Scatter  plots  are  not  invulnerable  to  misinterpretation. 
'When  the  scatter  of  points  falls  into  category  (4)  or  (5)  , 
as  in  Figure  1.3,  it  may  not  be  possible  to  judge  the  true 
relationship  between  the  variables  during  a  quick  glance  at 
the  scatter  plot,  although  there  obviously  is  some  relation- 
ship. Figure  1.3  contains  a  plot  of  the  first  200  points  of 
test  set  two  (Appendix  C)  which  is  used  in  Chapter  III, 
Section  2  to  test  LCWESS1  ability  to  follow  abrupt  changes 
in  curvature. 
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Figure  1.2   Linear  Least  Squares  Regression  of 
Active  Users  on  Total  Users  Logged  on  to  the 
W.R.  Church  Coaputer  System. 


I 

' 

■? 

• 

.'•• 

"••- 

'■ 

•     %• 

,  / 

■ 

'  1 

•" 

_ 

to 


TO 
X 


M 


Figure  1.3   Scatter  Plot  of  the  First  200  Points 

of  Test  Set  Two. 


Initial  inspection  of  this  data  suggests  the  presence  of 
a  quadratic  type  pattern.  This  impression  leads  naturally  to 
using  the  guadratic  least  squares  regression  line  of  Figure 
1.4  to  describe  the  dependence  of  Y  on  X.  The  accompanying 
analysis  of  variance  table  lends  some  support  to  this 
choice,  since  r2  =  .709. 

A  closer  examination  of  this  data  reveals,  however,  that 
although  it  locks  guadratic,  the  actual  dependence  of  Y  on  X 
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ANALYSIS  Of   VARIANCE   TABLE 
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Figure  1.4    Quadratic  Regression  on  the  First  200 
Points  of  Test  Set  Two. 


is  not  described  quite  that  simply.  Figure  1.5  demonstrates 
this  point  very  clearly.  Splitting  the  data  set  into  three 
parts  at  what  appear  to  be  logical  break  points,  (x  =  10,25)r 
and  fitting  a  linear  least  squares  regression  line  to  each, 
shows  that  Y  is  not  a  single  function  of  X  over  its  entire 
range.  In  fact,  there  appear  to  be  three  separate  linear 
trends  in  this  data. 

Analyses  of  this  type  are  seldom  undertaken  because  of 
the  tedium  involved  in  selecting  appropriate  splitting 
points  cnce  it  has  teen  determined  that  doing  so  may  be 
helpful. 

How  then,  can  an  analyst  discover  the  existence  of 
subtle  trends  or  define  the  shape  of  unusual  patterns 
contained  in  a  scatter  plot?  The  answer  is  to  use  local 
smoothing  procedures  rather  than  global  (regression)  fitting 
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Figure  1.5    Linear  Regressions  on  First  200  Points  of 
Test  Set  Two  Split  at  X  =  10  and  25. 


technigues.  Using  a  flexible  smoothing  procedure  that 
responds  to  local  changes  in  the  data  structure  allows  the 
data  itself  to  determine  the  shape  of  the  final  curve,  as 
opposed  to  the  classical  approach  of  fitting  polynomials 
which  have  predetermined  shapes. 

The  Bobust  Locally  Weighted  Regression  and  Scatterplot 
Smoothing  (LOWESS)  procedure,  [Eef.  3],  described  in  the 
remainder  of  this  paper,  is  a  very  good  method  for 
preventing  the  acceptance  of  assumptions  like  the  one  that 
led  to  using  the  quadratic  model  in  Figure  1.4.  The  LCHESS 
smoothing  technique  applied  to  this  data,  the  right  hand 
plot  of  Figure  1.6,  shows  very  clearly,  that  the  dependence 
of  Y  on  X  resembles  a  combination  of  three  distinct  linear 
functions   (the  parameter   F=-25  will   be  explained   later). 
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The  LOWESS  smoothing  process  has  a  tendency  to  round  angular 
corners.  The  straight  lines  in  the  center  of  each  segment 
suggest  linear  trends  similar  to  those  contained  in  Figure 
1.5. 

The  major  problem  with  trying  to  use  polynomials  to 
depict  subtle  trends  cr  to  describe  unusual  relationships  in 
a  data  set,  is  that  they  are  neither  flexible  nor  local.  By 
way  of  example,  the  points  on  either  extreme  of  the  first  of 
the  twc  plots  in  Figure  1.6,  have  a  significant  affect  on 
the  middle  of  the  fitted  polynomials. 
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Figure  1.6   Comparison  of  a  Quadratic  Regression  and  LOWESS 
smoothing  (F  =  .25)  on  First  200  Points  of  Test  Set  Two. 


The  LOWESS  procedure  on  the  other  hand,  allows  the  data 
points  themselves  to  determine  the  shape  of  the  smoothed 
curve.  Figure  1.6  also  demonstrates  that  global  polynomial 
regressions  have  a  more  difficult  time  following  abrupt 
pattern  changes  than  do  local  smoothing  procedures. 

B.   SCOPE 

Locally  Weighted  Regression  and  Scatterplot  Smoothing 
(LOWESS) ,  introduced  by  William  S.  Cleveland  in  1977, 
[Ref.  3],   is  a  generalized  extension   of  the  locally  fitted 
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polynomial  smoothing  techniques   used  for  many  years   in  the 
field  of  time  series1  analysis. 

The  essential  idea  behind  the  simplest  of  these  clas- 
sical smoothing  techniques  is  the  following.  If  the  data 
points  (Xi,Yi)  come  fiom  an  additive  model  of  the  form 

Y,  =  G(X,)  +  6, 

2 
where  E  (£i)  =  0  and  Var  (£i)   =  (7  and  G(Xi)   can  be  approxi- 
mated locally,  over  the  interval  i-m, . . .i, i+  1 , . . .i  +  m,  by  the 
linear  function 

Y,  =  B0(X,)  +  B,(X,)  x  x,+  6, 
then  averaging  the  Yi  over  this  range  yields 

M 

Yi  -  2M+1  L^    Yi+J 

J--M 

where 


E(Y,)  =  BQ(X()  +  B/X,)  x  X,  +  € 

(T2 
VAR(Y.)  =VAR(€.)  = 


•'     v   »'    2M+1 


If  the  assumption  that  the  €i  are  uncorrelated  is  true,  then 

A 

this  moving  average  process  produces  estimated  Yi's  that  are 
unbiased  and  have  smaller  variance  than  the  raw  Yi's.  This 
technique  makes  it  easier  to  distinguish  G(Xi)  through  the 
noise  (6i) .   Osing  a  bandwidth,  M,   larger  than  the  interval 


1  A  time  series  is  a  sequence  of  random  variables  Yi  which 
are  naturally  ordered  by  time  (i)  and  can  therefore  be 
presented  as  a  scatter  plot  of  Yi  versus  i.  Although  i  is 
usually  the  integers,  missing  values  can  occur. 
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over  which   the  linearity  assumption  holds,    will  introduce 
lias  into  the  results.   [Ref.  4] 

The  purpose  of  this  thesis  is  to  translate  the  generali- 
zation of  classical  smooting  techniques  proposed  by 
Cleveland  [Ref.  3],  and  expounded  upon  by  Chambers  et  al 
[Ref.  1]#  into  user  friendly  computer  programs  available  for 
use  as  exploratory  data  analysis  tools  by  students  and 
faculty  cf  the  Naval  Postgraduate  School. 

LCWESS,  written  in  APL,  an  acronym  for  "A  PROGRAMMING 
LANGUAGE,"  was  designed  to  be  used  alone  or  in  conjunction 
with  the  IBM  GRAFSTAT  statistical  graphics  package. 
GRAFSTAT,  an  experimental  program,  currently  under  develop- 
ment by  the  IBM  Watscn  Reaearch  Center,  is  available  at  the 
Naval  Postgraduate  School  for  test  and  evaluation  purposes 
[Ref.  5].  All  graphs  contained  in  this  paper  were  produced 
by  the  GENERAL  PLOT  function  of  the  GRAFSTAT  program. 

LOWS,  a  modification  of  LOWESS,  when  used  in  conjunction 
with  GRAFSTAT  and  expanded  versions  of  the  DRAFTSMAN  DISPLAY 
programs  described  in  [Ref.  6],  enhances  an  already  powerful 
exploratory  data  analysis  package. 

A  FORTRAN  version  of  the  basic  LOWESS  program  was 
designed  to  be  used  in  conjunction  with  either  DISPLA 
[Ref.  7],  or  any  other  W.R.  Church  computer  system  supported 
graphing  package. 

These  programs  are  interactive  and  can  be  used  easily  by 
individuals  who  have  little  or  no  APL  or  FORTRAN  programming 
skills.  Users  who  are  well  versed  in  these  languages  should 
be  able  to  modify  them  to  provide  tailor  made  outputs, 
expand  their  capabilities  or  incorporate  them  into  ether 
analysis  packages. 

Detailed  user  instructions  are  contained  in  Chapters  IV 
and  V  while  examples  of  their  use  are  presented  in  Chapter 
III.  Users  who  are  interested  in  the  mathematical  details 
of  Robust  Locally  Weighted  Regression  and  Scatterplot 
Smoothing  should  read  Chapter  II. 
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II.  TECHNICAL  DESCRIPTION  QF  LOWESS 

A.   OVERVIEW 

Locally  Weighted  Regression  Scatterplot  Smoothing 
(LOWESS)  ,  is  a  generalized  extension  of  locally  fitted  poly- 
nomial smoothing  techniques  used  by  many  statisticians  in 
time  series  analysis  *.  Unlike  its  predecessors ,  however, 
1251 SS  was  designed  to  work  on  unequally  as  well  as  equally 
.§2§ced  I .. s *  It  also  contains  a  robust  fitting  procedure 
that  guards  against  possible  distortion  of  the  smoothed 
curve  by  outlier  points.  The  general  procedure  used  by 
Cleveland  is  an  adaptation  of  iterated  least  squares  regres- 
sion techniques  developed  by  Albert  Beaton  and  John  Tukey 
£Ref.  8]. 

The  overall  objective  of  LOWESS,   like  most  smoothing  or 

A 

regression  routines,  is  to  compute  a  "fitted"  value,  Y,  that 
depicts  the  middle  of  the  empirical  distribution  of  Y  at 
each  X.  Unfortunately,  most  data  sets  do  not  contain  enough 
repeated  observations  at  each  X  to  provide  a  good  estimate 
of  the  middle  of  this  distribution.  LOWESS  derives  its  esti- 

A 

mate  cf   Y  from   the  equation   of  a   weighted  least  squares 

regression   line  fitted   to  a   set   of  data   points  whose   X 

values  are  located   in  a  user  defined   neighborhood  about  Xi 
(X  value  of  the  point  being  smoothed) . 

E.   MATHEMATICAL  DETAILS:  NON-ROBUST  LOWESS  SMOOTHING 

The  first  step  in  generating  a  LOWESS  smoothed  point 
consists  of  forming  a  neighborhood,  Figure  2.1,  centered 
around  Xi  and  comprised  of  its  Q  nearest  neighbors.  The  user 


1  A   brief  theoretical  explanation   of  these   techniques  was 

presented  in  Chapter  I. 
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determines  Q  by  choosing  the  parameter  F,  which  is  approxi- 
mately equal  to  the  percentage  of  the  number  of  data  points 
used  in  computing  each  fitted  value.  Q  is  (F  x  N)  rounded  to 
the  nearest  integer,  and  the  Q  nearest  neighbors  are  those 
points  whose  X  values  are  closest  to  Xi.  Note  that  there 
are  not  necessarily  an  equal  number  of  neighborhood  points 
on  either  side  of  Xi.  Also,  Xi  is  considered  to  Le  a 
neighbor  of  itself.  The  parameters  F  and  Q,  determined 
prior  to  smoothing  the  first  data  point,  are  held  constant 
and  used  throughout  the  procedure. 


I  I     \  .   T       t       T       I       t 

!  * 
'•J  *   *  '   * 

:  •       * 


Figure  2.1    Vertical  Strip  Containing  the  10  Nearest 
Heighbors  of  X6  in  Data  Set  Two. 

In  Figure  2.1,  the  point  to  be  smoothed,  X6,  is  high- 
lighted by  a  dotted  line  and  the  strip  boundaries  are  delin- 
eated by  solid  lines  passing  through  X1  and  X10. 

STEP  TWO  consists  of  defining  the  local  weighting  func- 
tion and  calculating  individual  weights  for  each  point, 
(Xk,Yx) ,  in  the  strip  formed  during  STEP  ONE.  This  weighting 
function  is  to  be  centered  at  Xi  and  scaled  so  that  it  hits 
zero  for  the  first  time  at  the  Q^  nearest  neighbor  of  Xi 
(the  strip  boundary  furthest  from  Xi) .  Functions  having  the 
following  properties  will  satisfy  these  requirements: 
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1-  W(U)   >  0      for  |U|  <  1    (positivity)  , 

2.  W  (-U)  =  W  (0)  (symmetry)  , 

3.  W  (U)  is  a  nonincreasing  function  for  a  >  0, 

4.  W  (0)   =0      for  l  a  |  >  1. 


Cleveland,   [Eef.  3],   suggests  using  a  tricube  weight  func 
tion  of  the  form: 


W(U)  = 


(1  -  |U|3  )3   FOR  IUI  <  1 
0        OTHERWISE 


Note  that  this   function  uses  the  absolute  value   of  U.   The 
weight  given  to  any  point  within  the  strip  is  calculated  ty: 


W(U)  =  W 


X  i  -  X  k 
Di 


The  variable  Di  is  the  distance  along  the  X  axis  from  Xi  to 
its  Q~~  nearest  neighbor.  This  is  the  distance  from  X6  to 
the  left  hand  boundary  in  Figure  2.1.  When  LOWESS  starts 
its  smoothing  pass  at  X1,  the  right  hand  boundary  passes 
through  its  Q~  nearest  neighbor,  X10  in  this  example.  The 
neighborhood  which,  at  that  time,  contains  the  points  X1  ... 
Xq  remains  fixed  until  the  distance  (Xi-X1)  is  greater  than 
(Xg-Xi) .  This  usually  occurs  at  i  =  Q/2  for  evenly  spaced 
data.  At  this  point  the  neighborhood  is  advanced  and  the  Q 
nearest  neighbor  shifts  to  the  left  hand  boundary  where  it 
remains  until  all  of  the  data  points  have  been  smoothed.  Di 
therefore,  is  generally  the  distance  from  Xi  to  the  right 
hand  boundary  for  i  =  1 . - . (Q/2)  and  is  the  distance  from  Xi 
to  the  left  hand  boundary  for  i  =  (Q/2)...N. 

The  weight  given  to  any  point  in  the  strip  is  equal  to 
the  height  of  the  ccrve,  H  (u) ,  at  Xk,  Figure  2.2.  This 
figure  demonstrates  that  the  tricube  weight  function: 


20 


1. 

2. 
3. 
4. 


gives  the  largest  weight  to  the  point  being  smoothed; 

decreases  smoothly  as  Xk  moves  away  from  Xi; 

is  symmetric  ahout  the  point  being  smoothed; 

th 
hits   zero  for   the   first  time   at   the  Q~   nearest 

neighbor  of  Xi. 


Figure  2.2   TRICOEE  Height  Function  for  the  10  Nearest 
Neighbors  of  X6  in  Data  Set  Two. 


In  cases  where  several  points  have  abscissas  equal  to 
Xi,  all  of  them  are  given  weight  1.  If  Di  is  zero,  meaning 
that  all  Q  points  in  the  strip  have  abscissas  equal  tc  Xi , 
it  is  impossible  to  estimate  the  slope  of  a  fitted  line.  In 
this  instance,  a  constant  equal  to  the  mean  Y  value  for  all 
Q  points  is  fitted  tc  the  point  (Xi,Yi). 

STEP  THREE  uses  weighted  least  squares  regression  tc  fit 
a  polyncmial  of  degree  P  to  the  data  points  that  lie  within 
the  strip  containing  Xi.  The  parameters  of  the  equation 
that  describes  this  line  are  the  values  of  Bj  j  =  0,1, ...P 
that  minimize: 


Y.     Wk(U)(Yk  -  Bo  -  BiXk  - 


P   2 

BpXk  ) 


K-1 
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Figure  2.3  shows  straight  (p=1)  and  quadratic  (p=2)  lines 
fit  to  the  neighborhood  points  surrounding  X6  in  data  set 
two. 


LINEAR 


QUADRATIC 

— T— T— — 1 ' | 1 r- 


Figure  2.3   linear  and  Quadratic  Fits. 

The  choice  of  an  appropriate  P  depends  on  the  user's 
perception  of  the  relationship  between  the  points  within 
each  neighborhood,  the  need  for  flexibility  to  reproduce 
patterns  in  the  data,  and  computational  ease.  The  existence 
of  physical  theories  that  define  the  relationships  as  teing 
nonlinear  might  also  influence  this  choice.  Smoothed  curves 
based  en  higher  order  polynomial  regressions  tend  to  fellow 
abrupt  pattern  changes  better  than  those  based  on  linear 
models.  Cleveland  [Bef.  3],  feels  that  computational 
considerations  begin  to  override  the  need  for  flexibility 
for  values  of  P  greater  than  1. 

The  smoothing  routine  written  for  this  thesis  is  capable 
of  performing  linear  cr  quadratic  regressions.  Using  p  =  1 
or  2  should  provide  adequately  smoothed  points  for  any  data 
set. 

The  final  step  in  the  Locally  Weighted  Regression 
portion  of  the  LCWESS  procedure   is  the  determination  cf  the 

A 

smoothed  point  (Xi,Yi),  Figure  2.4,  where: 
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Y|  =  I  BJ(X,).X,J 


J-t 


The  notation  used   here  emphasizes  that  the   coefficients  of 
the  X  i  are  different  for  each  point  Xi. 
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Figure  2.4    Scatter  Plot  of  Data  Set  Two  Superimposed 
With  Smoothed  Point  (X6,Y6). 


LOWESS  differs  from  most  other  smoothing  routines 
because  it  smooths  all  of  the  data  points.  This  becomes 
important  when  smoothing  small  data  sets,  when  important 
pattern  changes  take  place  near  the  ends  of  the  data  set,  or 
when  the  smoothed  curve  is  to  be  used  as  a  regression  line 
to  predict  future  trends.  Figure  2.5  summarizes  the  sequence 
of  steps  described  above,  as  they  are  used  tc  compute  a 
"fitted"  value  for  (X20,Y20),  the  right  hand  end  point  in 
data  set  two. 

A  comparison  of  figures  2-1  and  2.5  reveals  that  the 
widths  of  the  vertical  strips  about  (X6,Y6)  and  (X2G,Y20) 
are  not  equal.  Note  that  the  ten  nearest  neighbors  of  X20 
are  all  to  the  left.  Although  both  strips  contain  ten  data 
points,    the   requirement   to   center   them  around   their 
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Figure  2.5   Sumnary  of  Steps  Beguired  for  "Computing  the 
Smoothed  Value  at  (X20,Y20)  in  Data  Set  Two. 


respective  (Xi,Yi)  points  forces  the  right  hand  portion  of 
the  weighting  function  in  Figure  2.5  to  fall  off -scale.  The 
left  hand  portion  of  the  weighting  function  for  (X1,Y1)  is 
forced  off  scale  for  the  same  reason.  These  partial 
weighting  functions  still  fulfill  all  of  the  reguireicents 
outlined  earlier,  however.  Unegual  spacing  of  the  X's  also 
creates  variable  strip  widths. 

A  set  of  smoothed  data  points,  Figure  2.6,  is  obtained 
ly  completing  the  aforementioned  steps  for  each  point  in  the 
original  data  set. 
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Figure  2.6    Plots  of  Lowess  Smoothed  Data  Points  and 
Smoothed  Curve  Superimposed  on  Data  Set  Two,  (F=-5) - 


C.   MATHEMATICAL  DETAILS:  ROBDST  LOWESS  SMOOTHING 

The  robust  smoothing  feature  of   LOWESS  prevents  a  small 

number  of  outliers  frcm  distorting   the  smoothed  curve.   The 

point  (X10,Y10)  in  Figure  2.1  is  one  such  outlier. 

The  robust  procedure  computes  a" new  set   of  weights  for 

A 
each  (Xi,Yi)   based  on  the   size  of  the  residuals,   (Yi-Yi)  , 

obtained  after  the  first  smoothing  pass,  Figure  2.7. 

Cleveland  £Eef.  3],   suggests  using   a  bisguare  function 

of  the  form: 


,(V)  =   (1  -  V2  )2   FOR  M  <  1 

0        OTHERWISE 


Fobustness  weights  fcr  each  point  are  calculated  by 


DK(V)  =  D 


Ri 


6M 


where  M  is  the  median  of  the  absolute  value  of  the  resi- 
duals, Figure  2.8.  This  is  sometimes  referred  to  as  the 
Median  Alsolute  Deviation  (MAD). 
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Figure  2.7    Residuals  (Yi-Yi)  Versus  Xi  for  the 
Non-Robust  Smoothed  Points  of  Data  Set  Tuo. 


Figure  2.8 


Robust  Weighting  Function  For  the  First 
Pass  Through  Data  Set  Two. 


This  scheme  gives  small  weights  to  points  associated 
with  large  residuals  and  large  weights  to  points  with  small 
residuals.  One  iteration  of  the  robust  locally  weighted 
regression  procedure  is  completed  by  calculating  a  new  set 
of  "fitted"  values  using  the  weighting  function 

WT  --  W(U)*D(V) 
in  st€p  three. 
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Execution  of  the  entire  LOWESS  algorithm  consisting  of 
one  locally  weighted  regression  pass  and  two  robust  locally 
weighted  regression  passes  produces  a  robust  smoothed  curve, 
Eigure  2.9.  The  effect  of  the  "outlier"  can  be  seen  very 
clearly. 


NON-ROBUST  LOWESS  F  < 

T  I  t  >  I  I 


ROBUST  LOWESS  F  -  .5 


t        i        i        )        i         i        i 


Figure  2.9   Comparison  of  Non-Robust  and  Robust  LOWESS 
Smoothing  of  Data  Set  Two,  (F=.  5). 


Cleveland  [Ref„  3],  reports  that  the  number  of  computa- 
tions reguired  to  complete  the  LOWESS  algorithm  on  an  entire 
data  set  is  on  the  crder  of  FN2.  For  example,  60  linear 
regressions  were  used  to  complete  the  robust  smoothing  of 
the  20  artificial  data  points  in  Figure  2.9.  The  non-rcbust 
curve,  on  the  other  hand,  reguired  2/3  fewer  calculations 
and  took  less  than  1/2  the  time.  The  number  of  calculations 
required  to  produce  a  smoothed  curve  presents  no  significant 
problem  for  plots  of  fewer  than  100  points.  Computational 
time  can  be  saved  by  grouping  the  Xi's  on  data  sets  that 
have  repeated  X   values.   This  saving  results   from  the  fact 

A  A 

that  if  Xi+1  =  Xi  then  Yi+1  =  Yi.  Assigning  the  same  Yi 
value  to  each  of  the  Ni  repeated  Xi's  reduces  the  number  of 
regressions  reguired  by  Ni  for  non-robust  smoothing  and  by 
3Ni    for    robust   smoothing. 
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D.   CHCOSIHG  F 

There  are  no  set  criteria  for  choosing  F.  Small  values 
produce  curves  with  high  resolution  and  a  lot  of  ncise. 
larger  F's  produce  curves  with  low  resolution  and  less 
noise,  but  require  increased  computational  time.  In 
general,  increasing  F  tends  to  produce  smoother  curves, 
Figure  2.10.  Cleveland,  [Bef.  3],  suggests  that  values 
between  .2  and  .8  shculd  be  satisfactory  for  most  purposes. 
The  gcal  is  to  choose  the  largest  F  that  minimizes  the  vari- 
ability in  the  smoothed  points  without  distorting  patterns 
in  the  data.  Computational  time  may  become  a  consideration 
in  choosing  F  when  snoothing  large  data  sets.  In  general 
though,  F  will  decrease  as  the  series  length  increases. 


ROBUST  10WESS  F  -  .2 

1  I  I  t  I  I    ■        T 


ROBUST  LOWESS  F 

r         r 


ROBUST  LOWESS  F  -  .5 


ROBUST  LO*ESS  F   =   .7 


Figure  2-10    Comparison  of  Robust  LOWESS  Smoothing  of 
Data  Set  Two  for  Different  Values  of  F. 
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Smoothing  routines,  LOWESS  included,  do  not  provide 
regression  equations  cr  other  analytical  results  on  which  to 
test  goodness  of  fit.  The  user  must  judge  the  adequacy  of 
the  results.  The  choice  of  F  is  not  so  critical  for  cases  in 
which  the  purpose  of  the  smoothing  is  to  enhance  the  visual 
perception  of  gross  patterns  in  the  data.  For  example,  the 
rough  curve  obtained  by  using  F=.2  on  data  set  two,  the  left 
hand  plot  of  Figure  2.10,  provides  an  adequate  picture  cf  an 
overall  increasing  trend.  More  care  must  be  taken  in  some 
applications,  such  as  time  series  analysis,  or  when  the 
smoothed  (Xi,Yi)  values  may  be  used  as  a  type  of  regression 
function,  or  finally,  when  the  smoothed  curve  may  be 
presented  without  an  accompanying  plot  of  the  original  data 
points.  Taking  F=.5  is  a  reasonable  choice  when  there  is  no 
clear  idea  of  what  is  needed,  [Ref.  3].  Chambers,  [Ref.  1], 
suggests  that  it  is  often  wise  to  try  several  values  of  F 
before  selecting  the  "best"  one  for  a  particular 
application. 

Techniques  for  determining  bandwidth  using  techniques  of 
cross-validation  have  been  considered  by  Cleveland  [Eef.  3], 
and  Rice  [Ref.  9],  but  are  not  included  here. 
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III.  EVALUATION  QI  THE  L OWE SS    CURVE  SMOOTHING  PROGRAM 

A.  GENERAL 

Smoothing  routines  are  generally  used  to  filter  noisy 
data  and  approximate  underlying  relationships  that  may  be 
too  complex  to  describe  mathematically  or  too  difficult  tc 
fit  by  simple  polynomial  regression.  Effective  routines 
must  be  flexible  and  local.  They  must  allow  the  data  to 
determine  the  shape  cf  the  smoothed  curve  and  they  must  be 
able  to  follow  abrupt  as  well  as  smooth  changes  in  curva- 
ture. This  evaluation  will  test  LOWESS  in  each  of  these 
areas. 

B.  METHODOLOGY 

ICWESS,  like  most  other  curve  smoothing  schenes, 
provides  no  analytical  solutions  by  which  to  measure  its 
effectiveness.  The  correctness  or  adequacy  of  the  fit  must 
be  judged  subjectively.  And  there  are  no  standard  guidlines 
to  follow.  Sometimes  the  shape  of  the  fit  can  be  checked  by 
comparing  it  to  the  physical  laws  that  govern  the  applica- 
tion at  hand.  The  programs  written  to  support  this  thesis 
were  evaluated  by: 

1.  examining  their  performance  on  a  set  of  test  data  for 
which  the  underlying  functional  relationships  were 
known; 

2.  comparing  their  results  with  those  obtained  from 
widely  used  and  previously  validated  curve  smoothing 
techniques,  nacely;  LEAST  SQUARES  REGRESSION,  MOVING 
AVERAGE  and  COSINE  ARCH  weighted  smoothing. 

The  theory  of  moving  average  procedures  dates  back  to 
definitive  studies  of  discrete   time  series  models  completed 
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by  H.  Wold  in  the  mid  1930'  s.  The  general  process  is  based 
on  the  assumptions  and  theories  recounted  in  Chapter  I.  The 
moving  average  is  defined  by  the  expression 

X(T)  =  I  Aj  Z(T-J)    T  -  0.  1  - 

J=.-M 

where  M  and  N  are  ncnnegative  integers  and  the  weighting 
coefficients  Aj  are  real  constants.  Kendall  and  Stuart 
[Bef-  4],  and  Koopmans  [Ref.  10  ],  present  in  depth  discus- 
sions and  theoretical  derivations  that  expand  on  the  ideas 
presented  in  Chapter  I.  The  moving  average  routine  employed 
in  this  analysis  is  contained  in  the  IBM  GRAFSTAT  statis- 
tical graphics  package.  The  weighting  function  used  in  that 
program  takes  the  form 

A  ,  =  —         J  =  -M...  N 
J    M 

The  COSINE  ARCH  smoothing  procedure  used  here,  is  a 
moving  average  process  that  uses  a  cosine  weighting  function 
of  the  form 


A  ,=  — —  1  -COS  ?n^+]"l         J  =  0,l...  N-l 

J   M-H  m+1 


It  is  characterized  as  a  good  smoother  by  Ansccmte, 
[Ref.  11  ],  and  is  often  used  as  a  trend  remover  during  time 
series  analysis. 

C.   TESTING  PROCEDURES  AND  RESULTS 

Three   sets  of   test  data   were  developed   to  check   all 
aspects  of  the  LOWESS  program1 s  capabilities;  its  ability  to 
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follow  linear  trends  as  well  as  abrupt  and  smooth  changes  in 
curvature. 

1 .   Phase  One:  Linear  Trends 

Test  set   one,   Figure   3.1,   consists   of  150   data 
points  having  the  following  functional  relationship: 

Y  =  X  +  NORMAL(OJ)  NOISE    CKXsMO 

was  designed  to  test  IOWESS*  ability  to  detect  linear  trends 
in  noisy  data.  Although  this  test  appears  redundant,  many 
complex  smoothing  procedures  have  failed  because  they  did 
not  return  straight  lines  when  that  was  the  shape  of  the 
underlying  curve. 


Figure  3-1    Test  Set  One  With  and  Without  N(0,1)  Noise. 

The  adequacy  of  LOWESS*  performance  on  test  set  one 
was  measured  by  comparing  it  with  a  linear  least  squares 
regression  line  fitted  to  the  same  data. 

As  pointed  out  in  CHAPTER  II,  LOWESS  produces 
increasingly  smoother  curves  as  the  parameter  F  approaches 
1.  When  F=1,  each  neighborhood  used  throughout  the  smoothing 
process  contains  N   •  1  =  N  points.   This   implies  that  each 


32 


smoothed  point  (Xi,Yi)  is  computed  from  the  equation  of  the 
TRICOBE  weighted  regression  line  fitted  to  all  of  the  data. 
This  procedure  should  produce  a  LOWESS  smoothed  curve  that 
closely  resembles  the  linear  regression  of  Y  on  X.  The 
TRICOEE  weighting  function  used  in  LOWESS  may  cause  minor 
disparities  between  the  two  "fits,"  however.  A  visual 
inspection  of  the  bottom  two  plots  in  Figure  3.2  reveals 
that  LOWESS  and  the  linear  regression  produced  nearly 
identical  "fits. " 
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Figure  3.2   Comparison  of  LOiESS  Smoothing  and  Linear 
Regression  of  Test  Set  One. 


Goodness  of   fit  can   te  measured   by  examining   the 
A 
residuals  (Yi-Yi)   from  each  smoothing  procedure.   A  perfect 

reproduction  of  the  underlying  functional  relationship,   Y  = 
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X,  would  produce  a  set  of  residuals  distributed  Normal (0,1), 
the  same  distribution  found  in  the  noise.  The  results  of  the 
GRAFSTAT  distribution  fitting  proceedure  summarized  in  Table 
II  indicate  that  the  distribution  of  the  regression  resi- 
duals can  be  approximated  as  Normal  (0, 1 . 04)  while  the  LOWESS 
residuals  are  approximately  Normal (. 002, 1 . 0 1 6) . 

Hypothesis  tests  comparing  the  means  and  variances 
of  these  distributions  with  those  of  the  Normal  (0,1) 
distributed  noise,  will  provide  some  measure  of  the  goodness 
of  fit  of  each  smoothing  scheme.  The  results  of  these 
tests,  conducted  at  the  95/5  confidence  level,  are  summarized 
in  Table  I. 

The  output  of  the  GRAFSTAT  distribution  fitting 
procedure  presented  in  Table  II  and  the  hypothesis  tests 
summarized  in  Table  I,  suggest  that  there  is  no  significant 
difference  between  the  distribution  of  the  residuals  from 
the  linear  regression  or  LOWESS  smoothing  of  test  set  one, 
and  the  Normal  (0,1)  noise  incorporated  into  the  data.  This 
provides  strong  support  for  the  premise  that  LOWESS  depicts 
linear  trends  very  well.  Visual  comparison  of  the  LOWESS 
smooths  in  Figure  3.2  confirms  that  LOWESS  follows  the  same 
general  trend  regardless  of  what  F  is  used;  small  values 
provide  rougher  curves  that  have  the  same  general  slope. 


TABLE  I 

Comparison  of  the  Means  and  Variances  of  Residuals 
From  smooths  of  Test  Set  One  to  the  Normal  (0,1)  Noise 


noise 

T 

Z(  1-°/2) 

0 
1 

0.000 
0.346 

1.96 
1.96 

0 

1 

0.024 
0.  138 

1.96 
1.96 

p 


linear      mean  0.000  0  0.000  1.96  accept  0.05 

rgrsn        var  1.040  1  0.346  1.96  accept  0.07 

LOWESS      mean  0.002  0  0.024  1.96  accept  0.05 

var  1.016  1  0.138  1.96  accept  0.06 


34 


TABLE    II 

Summary    of   GBAFSTAT    Distribution   Fitting    of 
Eesiduals  from   Regression   and   IOWESS    Smooths  of   Test   Set  One 

RESIDUALS  FROM  LINEAR  REGRESSION 

NORMAL  DISTRIBUTION 


SELECTION 

L*BEL 

SAMPLE  SIZE 

MINIMUM 

MAXIMUM 

CENSORING 

EST.  UETHOO 


RE  SO 
ALL 
RCSD 
150 

"2.846 

3    151 
NONE 
MAXIMUM   LIKELIHOCO 


MEAN 
STD   OEV 
SKEWNESS 
KURTOSIS 


SAMPLE  FITTED 

2.0898E-M  2   0898E-M 

1    0295E0  1    0295E0 

1  . 1908E-1  0  OOOOEO 

3.1359E0  3   OOOOEO 


PERCENTILES  SAMPLE 


"1.7375 

-1    3381 

-0  59152 

-0.032298 

0.632  34 

1    3208 

1  .7182 


FITTED 
-1    6938E0 
-1    3196E0 
-6.9409E-1 
1    0399E-7 
6   9409E") 
1    3196E0 
1    6938E0 


COVAR1ANCE    MATRIX   OF 
PARAMETER  ESTIMATES 
MLI  SIGMA 

MU  0  0070189   0 

SIGMA  0         0  003533 

CCOONESS  OF  FIT 


CHI -SQUARE 

2 

3078 

DEC  FREED 

5 

SICNIF 

0 

80513 

KOIM-SMIRN 

0 

040266 

SICNIF 

0 

96  816 

CRAMER- V  U 

0 

027624 

SICNIF 

> 

15 

ANOER-DARL 

0 

17006 

SICNIF 

> 

15 

KS.  AD,  ANO  CV  SICNIF.  LEVELS  NOT  EXACT  WITH  ESTIMATED  PARAMETERS 

0  95  CONFIDENCE  INTERVALS 
PARAMETER    ESTIMATE       LOWER      UPPER 
MU  2.0898E-M   "0  16424   0  16424 

SIGMA         1.0295E0     0  92471   1.1613 

RESIDUALS  FROM  LOWESS  SMOOTHING 

NORMAL    DISTRIBUTION 


SELECTION 
LABEL 
SAMPLE   SIZE 

MINIMUM 
MAXIMIM 
CENSORING 
EST      METHOO 


lOWfSS   K£3/£HJALS 

ALL 

LOWRES 

150 

"2.909 

3  090 
NONE 
MAXIMUM  LIKELIHOCO 


MEAN 
STO  DEV 
SKEWNESS 
KURTOSIS 


SAMPLE  FITTED 

0  016268  0  0'6268 

1  0237  1  0237 
0  093313  0 

3  1452  3 


PERCENTILES  SAMPLE    FITTED 


5 

10 
25 
50 
75 

90 
95 


COVARIANCE  MATRIX  OF 
PARAMETER  ESTIMATES 
MU         SIGMA 

MU    0  0069  398  0 

SICMA  0         0  00349J2 

CCOONESS  OF  r I T 


-1 

-1 

-o 

0 
0 


6646 
3315 

55317 

010179 

64998 

2874 

7125 


6679 

2958 

6739 

01626B 

7064  3 

3284 

7005 


CHI-SOUARC 

1 

4385 

DEC  FREED 

5 

SICNIF 

0 

92006 

KOIM-SMIRN 

0 

047238 

SICNIF 

0 

89136 

CRAMER- V  M 

0 

030631 

SICNIE 

> 

.  15 

ANOf R-OARL 

0 

18198 

SICNIF 

> 

15 

KS.  AD.  AND  CV  SICNir.  LEVELS  NOT  EXACT  WITH  ESTIMATED  PARAMETERS 

0  95  CONFIDENCE  INTERVALS 
PARAMETER    ESTIMATE    LCWFR      UPPER 
MU  0  016268   -0  14  704   0  17958 

SICMA         1  0237      0  91948   1  1548 
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2«   I^ase  Two:  Atrup_t  Changes  in  Curvature 

Test  set  two,    Figure  3.3,   consisting  of   220  data 
points  having  the  following  mathematical  relationship 


Y  = 


•  4X  +  NORMAL(O.T)  NOISE  0*X<:10 

3  +  .1X  +  NORMAL(OJ)  NOISE  10<X<;25 

14.6  -  3.67X  +  NORMAL(0,1)  NOISE  25<X£40 

I  0  +  N0RMAL(0,1)  NOISE  40<X*44 


was  used  to  test  LOWISS1  ability  to  handle  abrupt  pattern 
changes.  The  smooth  of  test  set  two  generated  by  LOWESS,  was 
compared  to  those  prcduced  by  MOVING  AVERAGE  and  COSINE  AECH 
filtering  of  the  same  data. 


T     t'  '   I 


I    i r 


io    :o 


JO         40 


Figure  3.3  Test  Set  Two  With  and  Without  N(0,1)  Noise- 
Determining  the  amount  of  smoothing  reguired  by  a 
data  set  is,  perhaps,  the  most  difficult  aspect  of  using  any 
curve  smoothing  routine.  Smoothness  is  controlled  by  the 
size  of  the  parameter  F  in  LOWESS  and  by  the  parameter  M 
(bandwidth)  in  MOVING  AVERAGE  and  COSINE  ARCH  smoothing. 
These  parameters  determine  the  number  of  points,  or  neigh- 
borhood size,  used  to  compute  each  smoothed  value.  The  goal, 
regardless  of  the  jrethod  chosen,  is  to  use  the  largest 
neighborhood  that  minimizes  the   variability  in  the  smoothed 
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points  without  distorting  patterns  in  the  data.  Another 
factor  that  must  also  be  considered  when  choosing  M,  is  that 
MOVING  AVERAGE  and  CCSINE  ARCH  smoothing  routines  produce 
only  (N-M)  smoothed  points.  Using  proportionately  large 
values  of  M,  therefore,  might  result  in  losing  significant 
portions  of  the  original  pattern  at  the  ends.  This  shortcom- 
ming  will  be  evident  in  the  graphical  comparisons  made 
throughout  the  remainder  of  this  chapter. 

Comparison  tests  made  during  phases  two  and  three  of 
this  evaluation  used  selected  LOWESS  smooths  and  corre- 
sponding MOVING  AVERAGE  and  COSINE  ARCH  smoothed  curves. 
Parameters  for  the  three  processes  are  directly  convertible 
by  the  relationship  M  =  F»N. 

Figure  3.4  presents  graphical  comparisons  of  LOWESS 
smooths  (solid  line)  using  parameter  values  F  =  .15,. 25,. 50 
and  .75  to  illustrate  some  of  the  considerations  made  during 
the  parameter  selection  phase  of 

a  smoothing  operation.  The  exact  underlying  relationships 
(dashed  lines)  were  included  to  demonstrate  how  large  values 
of  F  can  cause  pattern  distortion. 

It  is  apparent  from  the  sequence  of  illustrations  in 
Figure  3.4,  that  ICWESS  produces  smoother  curves  as  F 
increases.  The  smoothest  curves  are  not  always  the  most 
desireable,  however.  The  bottom  two  curves  (F=.50  and  F=.75) 
have  distorted  the  original  pattern  by  using  too  many  points 
to  compute  the  smoothed  values.  Test  set  two  contains  50 
points  in  the  segment  (0<X<10).  Using  a  neighborhood  much 
larger  than  220*. 25  =  55  points  on  this  data  set  would  have 
a  tendency  to  fit  the  wrong  slope  to  the  first  linear 
segment.  Additionally,  it  would  cause  over  smoothing  of  the 
corners.  Figure  3.5  shows  the  neighborhood  and  linear 
regression  used  to  smooth  the  point  (X10,Y10)  during  produc- 
tion of  the  smoothed  curve  (F=-75)  pictured  in  the  lower 
right  corner  of  Figure  3.4.  It  is  easy  to  see  that  following 
this  slope  would  distort  the  pattern  presented  by  the  data. 
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Figure  3.4   Conpariscn  of  L0SESS  Smoothing  of  Test  Set  Two 
Osing  Different  Values  of  the  Parameter  F. 


I  .   »     I     f     I     f     '       I  F     I  "   I 


'■■     ■■■■''■     ' 

0       10      20      30      40 


Figure  3.5   Linear  Begression  Step  in  Smoothing  (110,110) 
in  Test  Set  Two  Osing  L0&ESS  iith  F=.7b. 


The  F=.15  plot  depicted  in  Figure  3.4,   demonstrates 
that   small  F's   create  very   locally   smoothed  curves   that 
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contain  a  great  deal  cf  noise  but  follow  gross  patterns  very 
well.  Using  a  small  F  is  an  excellent  idea  if  the  sole 
purpose  of  the  smoothing  is  to  highlight  major  trends  in  the 
data. 

The  LOWESS  smoothed  curve  obtained  by  using  F=.25  is 
the  one  test  suited  fcr  comparison  with  corresponding  MOVING 
AVERAGE  and  COSINE 'ARCH  smooths,  Figure  3.6. 
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Figure  3.6   Comparison  of  LOWESS.  MOVING  AVERAGE 
and  COSINE  ARCH  Ssmoothing  of  Ttest  Sset  Two. 


Inspection  of  the  plots  in  Figure  3.6  reveals  that 
all  of  the  smoothing  procedures  fit  similarly  shaped  curves 
to  most  of  the  data.  The  inability  of  the  MOVING  AVERAGE  and 
COSINE  ARCH  routines  to  smooth  the  extreme  edges  of  a  plot 
precluded  them  from  fitting  a  curve  to  the  last  segment  of 
test  set  two.   Practitioners  of   these  routines  often  extend 
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the  curve  or  fit  the  ends  by  eye.  Applying  these  techniques 
to  the  bottom  curves  in  Figure  3.6  does  not  reveal  any 
significant  pattern  changes.  LOWESS,  although  it  does  not 
follov  the  level  trend  accurately,  does  reveal  a  major 
pattern  change  in  the  last  section  of  the  data. 

All  three  of  the  procedures  have  a  tendency  to  round 
sharp  corners  as  the  parameters  F  and  M  are  increased.  The 
MOVING  AVEFAGE  curve,  in  the  lower  left,  has  a  very  rounded 
shape  and  does  not  highlight  the  linear  trend  in  segments 
one  or  two.  The  COSINE  AECH  filter  does  a  little  better.  It 
portrays  the  linearity  of  section  three  with  nearly  the 
correct  slope  but  fits  segments  one  and  two  with  one  smooth 
curve.  Additionally,  it  has  added  a  misleading  huirp  at  the 
intersection  of  segments  two  and  three.  LOWESS  is  the  only 
procedure  that  clearly  pictures  the  underlying  pattern  as  a 
series  of  straight  lines.  An  experienced  user  who  under- 
stands that  LOWESS  icunds  corners,  could  almost  duplicate 
the  original  pattern  by  connecting  the  linear  portions  of 
the  curve. 

Smoothing  procedures  are  not  only  judged  on  their 
ability  to  depict  patterns,  but  are  also  rated  on  their 
ability  to  filter  out  unwanted  noise.  Gross  differences  in 
their  capabilities  can  be  picked  out  easily  in  a  graphical 
comparison.  It  is  readily  apparent  that  the  MOVING  AVERAGE 
curve  in  Figure  3.6  is  much  noisier  that  either  the  LOWESS 
or  COSINE  AECH  smooths. 

A  more  analytical  measure  of  a  procedure's  smoothing 
ability  can  be  made  by  comparing  periodograms  of  the  unfil- 
tered  and  filtered  data.  A  periodogram  is  an  analysis  tech- 
nique used  to  estimate  the  spectral  density  function  of  a 
time  series  at  periodic  frequencies,  A  v.  The  periodogram 
function  is  defined  by 
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Refer  to  Koopmans  [Eef.  10],  chapter  8,  for  a  detailed 
discussion  of  the  periodogram  and  its  distributional  proper- 
ties. The  periodograms  in  Figure  3.7  provide 
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Figure  3-7   Comparison  of  Periodograms  of  LORESS,  MOVING 
AVERAGE  and  COSIHE  ARCH  Smoothing  of  Test  Set  Two. 


comparisons  of   the  filtering   properties  of  each  smoothing 
routine.    The    vertical   lines   on   each    plot   represent 
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periodicities,  the  spectral  frequencies  of  which  are 
measured  along  the  abscissa.  The  height  of  the  lines  is  an 
indicator  of  the  significance  of  the  associated  frequencies. 
The  plots  in  Figure  3.7,  were  truncated  at  Y  =  6  to  prevent 
the  otscuration  of  the  minor  frequencies. 

A  visual  inspection  of  these  periodograms  reveals 
that  IOWESS  produces  the  smoothest  (most  noise  free)  curve. 
In  fact,  the  periodogram  of  the  LOWESS  curve  and  noise  free 
data  are  nearly  identical. 

All  of  this  evidence  supports  the  conclusion  that 
LOWESS  performs  at  least  as  well  on  data  sets  that  contain 
abrupt  changes  in  curvature  as  do  the  widely  accepted  MOVING 
AVERAGE  and  COSINE  ARCH  procedures. 

3.   Phase  Three:  Smooth  Changes  in  Curvature 

Test  set  three.  Figure  3.8,  comprised  of  100  data 
points  having  the  following  relationship 


Y  =  SIN  X  +  NORMAL(0,1)  NOISE   0*X£2 


was  used  to  evaluate  LOWESS'  ability  to  follow  siocth 
changes  in  curvature.  The  same  procedures  used  in  the 
preceding  section  to  test  LOWESS1  ability  to  handle  abrupt 
pattern  changes  were  applied  here. 

Test  set  three  appears  to  either  have  a  negative 
linear  trend,  or  appears  to  cycle  about  the  line  Y  =  0.  A 
series  of  LOWESS  smooths,  Figure  3.9,  starting  with  a  small 
F  parameter,  was  used  to  discover  the  general  pattern 
(dashed  line)  and  refine  the  resulting  smoothed  curve  (solid 
line)  .  The  distorted  smooth  in  the  lower  right  hand  plot 
demonstrates  the  inherent  danger  in  selecting  a  large  F  if 
only  ore  smoothing  pass  is  planned. 
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Figure  3.8   Test  Set  Three  With  and  Without  N(0,1)  Noise. 
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Figure  3.9   Comparison  of  LOWESS  Smoothing  of  Test  Set 
Three  Using  Different  Values  of  the  Parameter  F. 


The  LOWESS   curve  obtained   by  using   F=.25  provided 
the  most   smoothing  without  distorting   the  pattern   and  was 


43 


used  in  a  direct  comparison  with  corresponding  MOVING 
AVERAGE  and  COSINE  ARCH  smooths,  Figure  3.10.  The  LCWESS 
smooth  is  the  only  curve  that  has  the  characteristic  sinu- 
soidal shape.  The  MOVING  AVERAGE  plot,  although  very  Dcisy, 
would  pr€sent  the  proper  picture  if  the  ends  of  the  curve 
were  extended.  The  radical  change  in  curvature  on  the  left 
end  of  the  COSINE  ARCH  smoothed  curve  detracts  from  its 
abiliity  to  represent  the  true  shape  of  test  set  three. 
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Figure  3.10    Comparison  of  LOWESS,  MOVING  AVERAGE  and 
COSINE  AECH  Smoothing  of  Test  Set  Three. 


Comparison  of  the  periodograms  presented  in  Figure 
3.11,  shows,  once  again,  that  I0WESS  produces  the  smoothest 
curve,  while  Figure  3.10  shows  that  it  seems  to  follow  the 
model  the  best. 
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Figure  3.11   Comparison  of  Periodograms  of  LOWESS,  MOVING 
AVERAGE  ans  COSINE  ARCH  Smoothing  of  Test  Set  Three. 

The  graphical  comparisons  made  in  Figure  3.10  and 
3.11  demonstrate  clearly  that  LOWESS  performs  at  least  as 
well  as  MOVING  AVERAGE  and  COSINE  ARCH  routines  when 
smoothing  data  that  has  a  smooth  curvilinear  pattern. 
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4  •   Phase  lour:  Unequal  Spacing 

Eesides  being  able  to  smooth  all  of  the  data  points, 
IOWESS  enjoys  another  possible  advantage  over  MOVING  AVERAGE 
type  procedures,  in  that  it  was  designed  to  work  on  unequal 
as  well  as  equally  spaced  data.  The  definition  of  MOVING 
AVERAGES 


M 

J--U 


holds  only  if  the  Yi's  are  equally  spaced  and  have  a  linear 
relationship  over  the  interval  (i-m)  ...  (i  +  m)  .  Violation  of 
the  linearity  assumption  introduces  bias  into  the  results 
while  violation  of  the  equal  spacing  requirement  invalidates 
them.  LCWESS  would  indeed  enjoy  a  distinct  advantage  over 
MOVING  AVERAGE  type  smoothing  procedures  if  it  produces 
acceptable  results  on  irregularly  spaced  data. 

This  section  examines  IOWESS1  ability  to  smooth  two 
different  sets  of  this  of  type  data.  The  first,  natural  log 
of  energy  dissipation  versus  depth,  Figure  3. 12,  is  a  trans- 
formed portion  of  data  collected  during  a  turbulence  meas- 
uring experiment  conducted  by  the  Department  of 
Oceanography,  U.S.  Naval  Postgraduate  School. 

The  LOWESS  curves  obtained  by  using  linear  and  quad- 
ratic regressions  during  Step  Three  of  the  smoothing  proce- 
dure were  compared  to  a  quadratic  least  squares  regression 
line  fit  to  the  same  data,  Figure  3.13 

Higher  order  regressions  were  rejected  as  plausible  solu- 
tions because  the  regression  coefficients  Bj,  j  =  3,4,5... 
were  found  to  be  statistically  insignificant  compared  to  the 
Bj,  j  =  0,1,2  constants.  A  quadratic  relationship  also 
seemed  to  be   a  reasonable  assumption  since   turbulence  is  a 
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Figure    3.12        Natural   Log   of    Energy   Dissipation    vs    Depth 
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ANALYSIS  Of  VARIANCE  TABLE 

SS  DE        MS 


GRAND  V€AN   (SEE  NOTE) 

10275  656 

1 

REGRESSION 

28  970 

2 

14.485 

RESIDUAL 

73.094 

164 

.446 

TOTAL 

10377.719 

167 

62.142 

32.500 


THE  SIGNIFICANCE  LEVEL  OF  REGRESSION   -   .0000 
(SIGNIFICANCE  LEVEL  -  AREA  UNDER  CURVE  BEYOND  COMMUTED  F) 
R  SOUARE  (SEE  NOTE)  -     .284 

NOIE:  IN  WEIGHTED  CASE.  SEE  DESCRIPTION  FOR  WTAN1NG 


Figure  3.13    Quadratic  Regression  and  Analysis  of  Variance 
Table  for  Ln  Energy  Dissipation  Versus  Depth. 
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functicn  of   pressure  which   varies  in   proportion  to   depth 
squared. 

Figure  3.14  shows  that  the  LOWESS  curves  (solid 
lines)  for  the  linear  (P  =  1)  smooths  follow  the  general 
guadratic  regression  (dashed  lines)  for  small  values  of  F 
tut  flatten  the  pattern  for  large  F's.  The  guadratic  (P  =  2) 
LOWESS  curves  close  in  on  the  regression  line  as  F  increases 
and  produce  a  fairly  good  match  as  F  reaches  .75. 

The  guadratic  LOWESS  curve   also  appears   to  follow 

local  peaks  and  valleys  more  accurately  for  small  F's  than 

does  its  linear  counterpart.   This  is  not  unexpected.  Figure 

3.15   shows  that   the  characteristically   bowed   shape  of   a 

A 
guadratic  curve  produces  larger  Yi  values  in  the  middle  of  a 

data  set  (Xi   is  located  in  the  middle  of   the  LOWESS  neigh- 
borhood) than  a  straight  line  fitted  to  the  same  data. 

The  "fits"  of  Figure  3.  14  can  be  compared  analyt- 
ically, as  was  done  in  the  Phase  One  test,  by  examining  the 
distribution  of  their  residuals.  Combining  these  analytical 
results  with  graphical  comparisons  provides  some  goodness  of 
fit  measure  for  the  two  curves.  The  nonparametric  Smirnov 
two  sample  test  [Ref.  12]#  is  appropriate  in  this  case 
because  the  distribution  of  the  residuals  is  unknown.  The 
results  cf  this  test  conducted  at  the  95%  confidence  level, 
Table  III,  indicate  the  there  is  no  significant  statistical 
difference  between  the  F=.75  guadratic  LOWESS  curve  and  the 
guadratic  least  sguares  regression  line.  See  the  lower  right 
hand  plot  of  Figure  3.14 

This  example  demonstrates  that  LOWESS  works  quite 
well  on  unequally  spaced  data.  It  also  shows  that  guadratic 
IOWESS  wcrks  better  than  the  linear  model  when  neighbcrhcod 
sizes  are  too  large  to  support  the  assumption  that  the 
neighborhood  points  are  related  linearly.  Quadratic  LOWESS 
should  be  used  whenever  the  data  suggests  that  that  assump- 
tion is  not  true. 
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ROBUST  LOWESS  SMOOTHING:  ENERGY  DISSIPATION  DATA 
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Figure  3-14   LOWESS  Smoothing  of  Energy  Dissipation  Data 
using  Linear  and  Quadratic  degressions  in  Step  Three. 


The  second  irregularly  shaped  plot  to  be  smoothed,  a 
lag-1  plot  of  200  NEAS(1)  random  variables,  is  pictured  in 
Figure  3.16 
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Figure  3.15    LOWESS  Smoothing  of  X53  in  Energy  Dissipation 
Data  Using  Linear  and  Quadratic  Regressions  in  Step  Three. 


TABLE  III 

Sairnov  Test  Comparing  the  Distri 
Residuals  from  Smoothing  ana  Regression 

bution  of 
of  Energy  Data 

type 

lin 

lin 

guad 

guad 

F 
.50 
.75 
.50 
.75 

T      Ks(.95) 
.216      .149 
.156      .149 
.156      .149 
.078      .149 

reject 
re jec  t 
reject 
accept 

The  NEAR  (1)  process,  derived  by  Lawrence  and  lewis 
[Ref.  13],  is  a  new  first  order  autoregr essive  time  series 
model  with  exponentially  distributed  marginals.  NEAR  (1)  data 
is  generated  as  a  simple  linear  combination  of  a  series,  En, 
of  independent  exponential  random  variables  by  the  model 


eU   +  BXN-1   WP-  A 

0     W.P.  (1-A) 
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Figure  3.16   Lag-1  Plot  of  NEAR(1)  Random  Variables 
Having  Autocorrelation  .75. 


These  NEAR(1)  variables  have  some  interesting  prop- 
erties that  make  them  especially  suitable  for  testing 
smoothing  routines.  They  have  fixed  serial  lag-1  correla- 
tion, p  =  AB  and  have  conditional  expectation 

-i 
t-"[XN,Vl  =  X]  =  0-AB)X  +  ABX 

The  following  parameters  were  used  to  generate  the  variables 
for  the  test;  A=-83,  B=.9,  A  =  1.  A  successful  smooth  of 
Figure  3.16  should  produce  a  straight  line  of  the  form 

Y  -  .25  +  .75X 
not  at  all  what  one  would  expect  from  looking  at  the  plot. 
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Figure  3.17  presents  comparison  plots  of  robust  and 
non-rcbust  linear  regression  and  robust  and  non-rcbust 
LOWESS  smoothing  of  the  near(1)  data  of  Figure  3.16.  The 
robust  regression  function  contained  in  the  IBM  GRAFS1AT 
package  was  used  in  this  example. 

Examination  of  the  plots  in  Figure  3. 17  shows,  once 
again,  that  LOWESS  smooths  are  comparable  to  those  produced 
by  accepted  linear  regression  techniques.  It  also  reveals 
that  neither  the  linear  regression  nor  LOWESS  procedures 
were  able  to  reproduce  the  true  lag-1  relationship,  (Y  =  .25 
+  .  75X),  shown  in  the  lower  right  hand  plot.  Both  rcbust 
curves  do  present  an  accurate  picture  of  where  most  of  the 
data  points  lie,  and  could  be  used  to  predict  where  a 
majority  of  the  future  points  are  likely  to  fall.  Relying  on 
these  curves,  however,  would  probably  lead  to  the  conclusion 
that  the  points  abcve  and  below  these  lines  represent 
outliers,  which  may  cr  may  not  be  the  case. 

It  must  be  concluded  from  LOWESS1  performance  on 
these  two  data  sets,  however,  that  it  smooths  unegually 
spaced  data  as  well  as  currently  available  regression 
techniques. 
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Figure  3.17   Comparison  of  Robust  and  Non-Robust  Linear 
Regression  and  1CWESS  Smoothing  of  the  Laq- 1  Plot 

of  NEAR(1)  Data. 


53 


IV.  DSIHG  THE  APL  VERSION  OF  LOW ESS 

A.   OVERVIEW 

This  chapter  provides  prospective  users  with  detailed 
instructions  for  using  LOWESS  as  a  stand-alone  program  or  in 
combination  with  the  experimental  GRAFSTAT  graphics  package. 
In  either  mode,  LOWESS  will  provide  the  user  with  vectors  of 

A 

robust  or  non-robust  smoothed  Yi  values  and  their  associated 
residuals.  When  used  in  conjunction  with  GRAFSTAT,  it  will 
also  produce  a  scatter  plot  of  the  original  data  with  the 
LOWESS  smoothed  curve  superimposed.  A  similar  type  presenta- 
tion of  the  absolute  value  of  the  residuals  versus  Xi  is 
also    available   on   reguest   from   the   program,    Figure   4.1 
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Figure  4.1    Saaple  of  Graphical  Outputs  from  LOWESS 
smooths  of  the  Data  (left) ,  and  Residuals  (right) . 


LOWESS  is  a  completely  interactive  program.  All  user 
defined  parameters  and  option  selections  are  entered  in 
response  to  program  gueries.  The  stand-alone  and  combined 
graphics  modes  of  operation  are  differentiated  only  by  their 
initial  set  up  procedures  and  by  the  choice  of  terminals  on 
which  the  program  is  run. 
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Although  no  API  programming  skills  are  reguired  to 
operate  10WESS,  users  should  become  familiar  with  system 
commands  and  procedures  for  entering  the  APL  environment, 
loading  and  copying  workspaces  and  variables  and  for  saving 
workspaces  by  reading  appropriate  sections  of  [fief.  14]. 
Operating  instructions  presented  in  the  follow-on  sections 
of  this  chapter  have  been  written  for  users  who  have  had 
little  or  no  experience  with  APL.  Experienced  users  may  find 
it  more  convient  to  refer  to  the  summarized  procedures 
presented  in  the  Tables  at  the  end  of  this  chapter. 

LCWESS  is  not  a  W.fi  Church  computer  center  supported 
program  and  is  not  included  in  any  of  the  APL  libraries 
listed  in  £Hef.  15],  Interested  users  should  contact 
Professor  P.A.W.  Lewis,  Department  of  Operations  Research, 
U.S.  Naval  Postgraduate  School,  for  information  concerning 
access  to  the  APL  workspace  DTNLFNS.  This  workspace,  which 
contains  LOWESS  and  several  other  data  analysis  related 
programs,  should  be  copied  and  stored  on  the  user's  A  disk. 

B.  TEBHIHAL  HEQOIBEBINTS 

LOWESS,  in  the  stand-alone  mode  can  be  run  on  any  APL 
capable  terminal  at  the  0.  S.  Naval  Postgraduate  School.  The 
IBM  GFAJSTAT  software,  which  generates  the  graphical 
displays  when  operating  LOWESS  in  the  combined  graphics 
mode,  reguires  the  use  of  either  IBM  3277GA  or  3278/79 
graphics  display  terminals.  The  3278  terminals  reguire 
special  modification  to  produce  graphical  displays.  None  of 
these  terminals  are  available  for  public  use  at  the  Naval 
Postgraduate  School.  See  Table  IV  for  a  summary. 

C.  PBOGBAH  INITIALIZATION:  STAND-ALONE  MODE 

Since  LOWESS  is  written  in  APL,  users  must  enter  the  APL 
sub-environment  after  completing  normal  log   on  procedures. 
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This  is  done  by  typing  the  letters  "APL"  and  depr€ssing  the 
enter  key.  The  response  "CLEAR  WS»  indicates  that  the 
computer  is  ready  to  accept  APL  commands. 

APL  uses  a  special  character  set  that  is  invoked  by 
keying  the  APL  ON/OFF  key  while  depressing  the  ALT  key  on 
IBM  3278/79  terminals  or  by  merely  hitting  the  APL  ON/OFF 
key  on  the  3277GA  graphics  display  terminals.  These  special 
APL  characters  are  imprinted  in  red  (3278/79  terminals)  or 
black  (3277GA  terminals)  on  the  top  and  front  surfaces  of 
the  normal  keys.  The  symbols  located  on  the  front  of  the 
keys  are  accessed  by  typing  the  appropiate  key  while 
depressing  the  APL  ALT  key.  When  two  APL  characters  are 
pictured  on  the  top  surface  of  the  same  key,  the  uppermost 
character  is  invoked  by  hitting  that  key  while  depressing 
the  SHIFT  key,  much  the  same  as  producing  capital  letters 
during  normal  typing  operations. 

The  final  step  in  the  initialization  procedure  consists 
of  loading  LOWESS  and  associated  sub-programs  into  the 
active  APL  workspace.  This  is  accomplished  by  entering  the 
system  command  ") PCOPY  DTNLFNS  LOWESS  "  ».  This  command 
copies  a  group  of  programs  reguired  to  execute  LOWESS.  See 
[Ref.  16  #p-107],  for  information  about  the  APL  GROUP 
command.  The  computer  responds  by  presenting  WS  size  and 
"date-saved"  information  when  all  programs  have  been  loaded. 
Initialization  is  new  complete  and  the  user  is  ready  to 
execute  LOWESS  by  typing  "LOWESS"  and  hitting  enter.  From 
this  point  on,  user  enteries  are  made  in  response  to  program 
gueries  or  instructions.  Table  I  summarizes  these  initiliza- 
tion  procedures. 


1  Underscored   letters  are   obtained  by   typing  the   desired 
letter  while  depressing  the  APL  ALT  key. 
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D.   PEOGEAM  INITIALIZATION:  COMBINED  GRAPHICS  MODE 

As  noted  in  Section  B  of  this  chapter,  the  combined 
LOWESS-GRAFSTAT  package  can  only  te  run  on  IBM  3277GA,  3279 
or  specially  configured  3278  graphics  display  terminals. 
Additionally,  efficient  operation  of  GRAFSTAT  requires  a 
minimum  workspace  size  of  2  megabytes.  The  W.R.  Church 
Computer  Center  has  established  a  limited  number  of  public 
domain  workspaces  with  special  account  numbers  and  passwords 
to  meet  this  need,  £Eef.  5].  Hard  copy  graphics  printers 
are  available  for  use  with  the  3277GA  terminals  located  in 
Ingersall,  Root  and  Spanegall  Halls.  The  remainder  ct  this 
section  focuses  on  the  use  of  the  3277GA  terminals. 

Data  files  stored  on  the  user's  personal  disk  are 
unavailable  for  use  while  operating  in  one  of  the  public 
workspaces.  Users  may: 

1.  send  files  tc  the  public  workspace's  user  number 
prior  to  logging  on  and  commencing  a  work  session; 

2.  link  to  his/her  own  disk  after  logging  on  to  the 
public  workspace  useing  CP  link  procedures  outlined 
in  [Eef.  17]. 

After  logging  on  to  one  of  the  public  workspaces  and 
completing  the  data  transfer  or  linking  procedures  described 
above,  the  user  must  enter  the  APL  sub-environment  by  typing 
"APLGS7"i  and  hitting  the  enter  key.  The  response,  "CLEAR 
AS"  indicates  that  the  computer  is  ready  to  accept  APL 
commands. 

The  special  APL  characters,  labelled  in  black,  are 
invoked  by  depressing  the  APL  ON/OFF  key.  Since  this  key 
also  turns  the  APL  characters  off,  it  may  be  necessary  to 
check  their  status  by  trial  and  error.  Detailed  instructions 


1.  The  command,  "APLGS7".  invokes  special  svstem  routines 
required  to  support  the  IBM  GRAFSTAT  software  package.  This 
procedure  may  change.  Contact  Professor  P.A.W.  Lewis, 
Department  of  Operations  Research,  if  these  procedures  do 
not  work. 
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for  using  the  APL  character  set  are  presented  in  Section  C 
of  this  chapter. 

The  initialization  procedure  is  completed  by  loading 
GRAFSTAT  and  LOWESS  into  the  active  APL  workspace.  GEAFSTAT 
should  te  loaded  first,  by  entering  the  system  command 
")LOAD  GRAFSTAT".  The  GEAFSTAT  package  is  quite  large  and 
may  take  several  minutes  to  load.  The  following  set  of  user 
instructions  will  appear  on  the  screen  when  GEAFSTAT  is 
fully  loaded: 

THIS  IS  A  NEW  (5/1/84)  EELEASE  OF  GEAFSTAT.  IT  EUNS  ON  THE 
3277/GA  OE  ON  THE  3276/79.  IT  HAS  A  NUMBER  OF  NEW  FUNCTIONS. 
YOUE  CID  CCNTEOl  VECTCES  WILL  WOEK  AS  BEFORE.  IF  YCU  )CCPY 
BATHER  THAN  ) LOAD  THIS  WORKSPACE  YOU  MUST  EXECUTE  THE 
FUNCTION  LATENT  BEFCEE  STAETING.  THE  NEXT  RELEASE  IS 
SCHEDULED  FOR  7/84. 

TO  BEGIN,  TYPE:  START 

FOR  MCRE  INFORMATION,  TYPE:  DESCRIBE 

It  is  not  necessary  for  the  user  to  start,  or  even 
interact  with  GRAFSTAT  to  smooth  a  set  of  data:  the  GRAFSTAT 
message  may  be  cleared  by  depressing  the  CLEAR  key. 

Users  who  have  the  APL  workspace  DTNLFNS  stored  on  the 
public  workspace  disk,  or  who  are  linked  to  their  cwn 
personal  disk  where  it  is  stored,  need  only  enter  ") PCOPY 
DTNLFNS  LOWESS  "  to  complete  the  initialization  process.  The 
computer  responds  by  presenting  WS  size  and  date  saved 
information  when  all  programs  have  been  leaded. 
Initialization  is  now  complete  and  the  user  is  ready  to 
execute  LOWESS  by  typing  "LOWESS"  and  hitting  enter.  From 
this  pcint  on  user  enteries  are  made  in  response  to  program 
queries  or  instructions.  See  Table  VI  for  a  summary  of  these 
procedures. 
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E.   OPERATION  OF  LOWESS 

This  section  provides  detailed  descriptions  of  the  user 
inputs  required  during  normal  operation  of  LOWESS.  The 
discussion  assumes  that  one  of  the  initialization  procedures 
described  in  Sections  C  and  D  of  this  chapter  has  already 
been  completed. 

Execution  of  the  IOWESS  program  is  initiated  by  typing 
"LOWESS"  and  hitting  the  return  key.  Since  the  program  is 
interactive  it  will  respond  with  a  series  of  queries  or 
instructions  requesting  the  user  to  input  data  or  make  deci- 
sions about  the  operation  of  the  program.  The  exact  sequence 
of  program  initiated  queries  and  instructions  is  formulated 
in  response  to  user  inputs. 

User-computer  interactions  required  during  execution  of 
LOWESS  are  categorized  into  two  types;  data  input  and 
program  operation. 

Since  the  program  cannot  operate  without  data,  the 
initial  concern  of  LCWESS  is  to  locate  and  read  the  data  set 
it  is  about  to  smooth.  Data  can  be  read  from  the  active  APL 
workspace,  a  stored  AEL  workspace  or  from  a  stored  CMS  file. . 
Data  that  is  not  located  in  the  active  workspace  must  be 
accessible  from  that  workspace.  This  presents  no  problem 
when  the  user  is  operating  under  his/her  personal  user 
number  and  the  data  is  stored  on  his/her  disk.  This  may 
become  a  problem  when  the  user  is  logged  on  to  one  of  the 
public  workspaces  described  in  Section  D  of  this  cahapter, 
and  has  not: 

1.  sent  the  data  to  the  public  workspace  where  he/she  is 
working  and  stored  it  on  the  assoceated  A  disk; 

2.  linked  to  his/her  own  disk   prior  to  entering  the  APL 
sub-environment,  see  Section  D  of  this  chapter. 

Wherever  the  data  is  stored,  it  MUST  be  formatted  into 
two  separate   lists,   one   containing  the   X  values   and  the 
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other  containing  the  corresponding  Y  values  of  the  points 
being  smoothed. 

Data  which  resides  in  the  active  workspace  as  APL 
vectors1  is  entered  into  LOWESS  when  the  user  types  the 
variable  name  and  hits  enter  in  response  to  appropriate 
program  reguests. 

Data  which  is  stored  in  another  APL  workspace  on  the 
disk  in  use  or  on  a  disk  to  which  the  user  is  linked,  will 
be  transferred  to  the  active  workspace  by  the  sub-program 
DATAINPUT.  The  user  needs  only  to  enter  the  workspace  name 
and  variable  names  when  reguested.  DATAINPUT  will  also  read 
and  convert  CMS  files  stored  on  the  disk  in  use  or  on  a  disk 
to  which  the  user  is  linked,  provided  they  are  formatted  as 
described  above  and  contain  only  numerical  data.  A  mixture 
of  alphabetic  and  numeric  characters  in  a  CMS  data  file  will 
create  an  error  and  terminate  execution  of  LOWESS.  These 
data  transfer  features  will  work  egually  well  in  either  mode 
of  operation.  The  IEM  GRAFSTAT  program  contains  functions 
entitled  CMS  READ  and  CMS  WRITE  that  will  convert  data  in 
both  directions  when  operating  in  the  combined  graphics 
mode.  Users  will  generally  not  need  to  use  this  feature  of 
GRAFSTAT,  however. 

Program  operation  inputs  include: 

1 .  the  value   of  the  parameter  F   (selection  considera- 
tions are  discussed  in  Chapter  II  Section  C) ; 

2.  whether  robust  or  non-robust  smoothing  is  desired; 

3.  whether   or   net  a   plot   of   the  original   data   and 
smoothed  curve  is  desired; 


1  In  APL,  a  list  of  data  points  stored  under  a  single  vari- 
able name  is  referred  to  as  a  vector.  See  [Ref.  1U],  for 
further  details. 
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4.  whether  or  not  a  plot  of  the  absolute   values  of  the 
residuals  and  associated  smoothed  curve  is  desired; 

5.  X  and  Y  axis  labels  for  these  plots. 

Plots  can  only  be  generated  while  operating  LOWESS  in 
the  combined  graphics  mode.  Requesting  plots  when  GRAFS1AT 
has  not  been  loaded  will  produce  an  error  and  terminate 
execution.  Hard  copies  of  plots  may  be  obtained  by 
depressing  the  HARD  COPY  button  on  the  bottom  of  the 
graphics  screen. 


TABLE  IV 

Summary  of  Terminal  Requirements  and 
Available  Outputs 


Terminal 
Required 

Additional 

Software 

Required 

Availatle 
Output 


Stand-Alone  Mode 
3277GA  3278  3279 

none 


Numerical: 
YSMTH  ..  smooth  Y 
X1  ...  original  X 
Y1  ...  original  Y 
RESY  . .  residuals 


Combined  Graphics 

3277GA,  3279  or  3278 
with  graphics  board 


IBM  GRAFSTAT  pgm 


Numerical: 
YSMTH  ..  smooth  Y 
X 1  ...  original  X 
Y1  ...  original  Y 
RESY  ..  residuals 

Graphical : 

Smooth  curve 

I  Residuals |  vs  Xi 
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TABLE  ? 
Initialization  Procedures,  Stand-Alone  Mode 

Objective  User  Inputs       Program  Eesponse 

(1)  enter  APL 

environment  "APL"  "CLEAE  WS" 

(2)  invoke  APL 

characters     APL  ON/OFF  key  none 

(3)  load  LOWESS 

and  assoc.     ) PCOPY  DTNLFNS     "saved  (date)  (time)" 
programs  LOWESS 


TABLE  VI 

Initializat 

ion 

Procedures,  Combined  Graphics 

Objective 

User  Inputs 

Program  Eesponse 

(1)  enter  APL 
environment 

"APLGS7" 

"CLEAE  WS" 

(2)  invoke  APL 
characters 

APL  ON/OFF  key 

none 

(3)  load 

GEAFSTAT 

") LOAD  GEAFSTAT" 

initialization 
screen,  see  p  59 

(4)  load 
LOWESS 

") PCOPY  DTNLFNS 
LOWESS" 

"saved  (time)  (date)" 

(5)  execute 

"LOWESS" 
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V-  OSING  T BE  FORTRAN  VERSION  OF  LOW ESS 

A.  OVERVIEW 

This  chapter  provides  prospective  users  with  detailed 
instructions  for  using  a  FORTRAN  program  that  accomplishes 
the  LOWESS  curve  smoothing  procedure  described  in  Chapter 
II.  The  program,  entitled  LOWESS,  will  provide  the  user  with 
CMS  files  containing  robust  or  non-robust  Yi  values  and 
their  associated  residuals.  These  data  files  can  be  used  to 
create  plots  of  the  raw  and  smoothed  data  points  using 
DISPLA  [fief.  7],  EASYPLOT,  or  other  W.R.  Church  computer 
center  supported  IMS!  or  MON-IMSL  plotting  routines. 

LCWESS  is  a  completely  interactive  program.  All  user 
defined  parameters  and  option  selections  are  entered  in 
response  to  program  gueries. 

Although  no  FORTRAN  programming  skills  are  required  to 
operate  LOWESS,  users  should  become  familiar  with  FORTRAN 
and  WATFIV  operating  system  commands  and  also  with  the  basic 
XEDIT  editor,  by  reading  appropriate  sections  of  [fief.  18]," 
and  [Bef.  19].  A  limited  ability  to  format,  XEDIT  and 
manipulate  data  files  will  be  helpful  when  using  LCWESS  or 
when  interacting  with  any  of  the  plotting  routines  mentioned 
earlier. 

B.  TERMINAL  REQOIREHENTS 

LOWESS  can  be  run  on  any  remote  terminal  attached  to  the 
IBM  computer  located  at  the  Naval  Postgraduate  School.  The 
DISPLA  and  EASYPLOT  plotting  routines  require  the  use  of  the 
IBM  3277GA  graphics  display  terminals  located  in  Ingersall, 
Root  and  Spanegall  Halls.  Plotting  routines  that  use  the 
remote  VERSETEC  or  line  printers  can  be  accessed  from  any 
terminal. 
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C.   PROGRAM  INITIALIZATION  (FORTRAN  VERSION) 

Since  LOWESS  is  not  a  W.  R.  Church  computer  center 
supported  program,  it  is  not  available  in  any  of  the 
center's  public  access  libraries.  Interested  users  should 
contact  Professor  P.A.W.  Lewis,  Department  of  Operations 
Research,  U.S.  Naval  Postgraduate  School,  for  information 
concerning  access  to  LOWESS  and  its  supporting  programs. 
Copies  of  the  programs  listed  in  Table  VII  should  be 
obtained  and  stored  en  the  user's  A  disk.  Annotated  copies 
of  the  source  codes  are  contained  in  Appendix  (B) . 


TABLE  VII 

Programs  and  Subroutines  Required  for  the 
Operation  and  Support  of  the  FORTRAN  Version  of  LOWESS 

Filename     Filetype     Filemode 

LOWESS  FORTRAN  A1 

LOWS  EXEC  A1 

PXSORT  FORTRAN  A1 

LLBQF  FROTRAN  A1 


PXSORT  and  LLBQF  are  contained  in  the  IMSL  library. 
Users  having  access  to  these  programs  through  the  W.R. 
Church  computer  center  need  not  obtain  personal  copies. 

The  LOWS  EXEC  is  used  to  activate  system  libraries, 
designate  CMS  storage  space  required  for  LOWESS  input  and 
output  files.  It  is  invoked  by  typing  "LOWS  EXEC"  and 
hitting  the  ENTER  key.  The  file  definitions  contained  in  the 
LOWS  EXEC  are  listed  in  Table  VIII.  See  [Ref.  17],  for  info- 
mation  on  the  use  of  IXEC  executive  programs. 

This  EXEC  defines  enough  file  space  to  accomodate  five 
data  sets.  The  user  need  only  enter  the  appropriate  file 
number  when  gueried  by  LOWESS,  to  smooth  any  of  the  data 
sets. 
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TABLE  VIII 
Input  and  Output  File  Definitions  Used  in  LOWS 

File  number     Filename  Filetype 

2  L0W2  DATA 

3  L0W3  DATA 

4  LOWU  DATA 

7  L0W7  DATA 

8  L0W8  DATA 


It  may  become  necessary  to  change  these  filenames  to 
avoid  losing  data  when  smoothing  a  large  number  of  data  sets 
or  when  smoothing  one  set  a  number  of  times.  This  may  be 
accomplished  in  one  of  the  following  ways: 

1.  by  entering   the  CMS   command  "XEDIT   LOWS  EXEC"   and 
changing  the  appropriate  names; 

2.  by  using  the  CMS  command  "E  (old  filename)  (old  file- 
type)  (old  filemode)  (new  filename)  (new  filetype) 
(new  filemode)"  for  each  file  needing  to  be  changed, 
see  [Bef.  18  ]. 

File  management  is  important.  It  is  absolutely  impera- 
tive that  data  input  files  have  the  same  filename,  filetype 
and  filemode  listed  in  the  LOWS  EXEC  to  prevent  inadvertant 
smoothing  of  the  wrong  data  or  to  prevent  programming  error. 

D.   DATA  FILES  (FOBTBAH  VEBSION) 

LCWESS  requires  that  data  be  input  in  two  columns  of 
floating  point  constants  in  (2F15.5)  format,  X  values  on  the 
left  and  Y  values  en  the  right.  This  is  accomplished  by 
creating  a  new  file  with  the  command  "XEDIT  (filename) 
(filetype)."  The  filename  and  filetype  chosen  should  be  one 
of  these  listed  in  Table  VIII  or  one  that  is  contained  in 
the  user's  own  LOWS  EXEC.  Refer  to  [Bef.  19],  chapter  2,  for 
more  detailed  instruction  on  creating  files.  The  (2F15.5) 
format  requires  that  all  input  variables  contain  a  decimal 
point  followed  by   nc  more  than  five  decimal  places.   The  X 
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values  must  be  entered  in  the  first  fifteen  spaces  and  the  Y 
values  in  the  second  fifteen  spaces  of  each  line  (one  set 
per  line) . 

The  output  from  LCWESS  is  placed  in  a  file  designated  by 
the  user.  This  can  be  the  same  file  used  for  inputting  the 
(X,Y)  values  or  a  different  one.  A  different  file  should  be 
used  if  the  same  data  set  is  going  to  be  smoothed  with 
several  different  parameters.  This  output  is  printed  in 
(4F15.3)  format.  The  first  column  is  the  original  X  values 
ordered  from  smallest  to  largest.  Column  two  contains  the 
corresponding  Y  values,  while  column  three  contains  the 
smoothed  Yi  values  and  column  four  contains  the  (Yi-Yi) 
residuals. 

E.   OPERATION  OF  LOWESS  (FORTRAN  VERSION) 

This  section  provides  detailed  descriptions  of  the  user 
inputs  required  during  normal  operation  of  LOWESS.  The 
discussion  assumes  that  the  LOWS  EXEC  has  been  properly 
prepared  and  executed  and  that  input  files  have  been  built 
according  to  instructions  presented  in  Section  C  of  this 
chapter. 

Execution  of  the  LOWESS  program  is  initiated  by  typing 
"WATFIV  LOWESS  *  (XTYPE".  Since  the  program  is  interactive, 
it  will  respond  with  a  series  of  queries  or  instructions 
requesting  the  user  to  input  data  or  make  decisions  about 
the  operation  of  the  program. 

The  initial  concern  of  LOWESS  is  to  locate  and  read  the 
data  set  it  is  about  to  smooth.  Data  can  only  be  read  from 
one  of  the  files  defined  in  the  LOWS  EXEC  routine.  The  user 
tells  LOWESS  what  file  to  read  by  entering  the  appropriate 
file  number  (2,3,4,7  or  8)  in  response  to  the  instruction 
"ENTER  TEE  FILE  NOMBEE  OF  THE  INPUT  DATA  FILE."  The  program 
will   terminate  with   an   error  if   the   LOWS   EXEC  was   not 
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properly  prepared  or  if  the  data  file  was  not  formatted  as 
described  in  the  preceding  section.  Other  program  requested 
inputs  include: 

1.  the  value   of  the   parameter  F   (selection  considera- 
tions are  discussed  in  Chapter  II  Section  C)  ; 

2.  whether  or  robust  or  non-robust  smoothing  is  desired; 

3.  the  file  number  of  the  desired  output  file. 
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APPENDIX  A 
APL  PROGRAMS 

This  Appendix  contains  annotated  listings  of  the  APL 
programs  written  for  this  thesis.  Source  listings  cf  the 
system  library  programs  used  to  support  the  CMSREAD  function 
called  in  the  program  DATAINPUT  are  not  included. 

LOWESS  is  an  interactive  program  that  executes  the 
Robus t-Locally-Weighted  Regression  Scatter-Plot  Smoothing 
procedure  described  in  the  preceeding  sections  of  this 
paper.  It  calls  the  following  subprograms;  DATAINPUT, 
REPEATCK,  REGRES,  BZGRES2  PLOTQOERY  and  LOWS  during  execu- 
tion. Refer  to  Chapter  IV  for  detailed  user  instructions. 

«*LOWESS 

[0]  LOWESS,N,Q»WX, J,I;A,B»Q,STRP,U,D»TX,WT»Z;BR,DA|DB,R,U1,M,R0> 
AR,RHS, PROCEED, N1 i PT , SKP, YS, F) ROB) REG j XAXIS, YAXIS, 
PHDR,QS5,QS6,PT 

[1]  "*»  DO  NOT  MOVE  OR  ERASE,  GRAFSTAT  FUNCTION  HEADER 

[2]  »»**  GRAFSTAT  WILL  NOT  ADD  A  LINE  TO  THIS  FUNCTION  WITHOUT 

[33  *"*  THIS  HEADER 

[4]  »"» 

[5]  """  LOWERS  CALLS  THE  FOLLOWING  PROGRAMS  AND  VARIABLES: 

[6]  AM  DATAINPUT,  REPEATCK,  PLOTQUERY,  REGRES,  REGRES2,  RPLT, 

[7]  Mil  NRPLT,  RESPLT,  SRESPLT 

[81  MRU 

[9]  0PP«-6 

[10]  DATAINPUT 

-♦[11]  -»L?x\  ( PROCEED*' N') 

-»[12]  -*e 

[13]  L?:Y1«-Y«-Y[+X]70RDERDATA 

[14]  X1«-X<-X[*X]   J 

[15]  'INPUT  F  ...  (OiF^I)' 

[16]  QH0.5+QMN1*-pX)xF«-O 

[17]  'DO  YOU  WANT  TO  USE  LINEAR  OR  QUADRATIC  FITTING  DURING  ' 

[18]    'THIS  SMOOTHING  ROUTINE?' 

[19]  '(LIN  OR  QUAD)' 

[20]  REGM+n 

[21]  'DO  YOU  WANT  TO  USE  THE  ROBUST  SMOOTHING  OPTION?' 

[22]    '  <YES  OR  NO)' 

[23]  ROBM+O 

[24]  YSt-NlpO 

[25]  WX«-N1p1 

[26]  J»-0 

[27]  L1  :J«-J+1  J 

[28]  HO 

[29]  AM 

[30]  B«-Q 


COUNTER  FOR  ROBUST  SMOOTHING  LOOP 
STARTS  FIRST  STRIP  AT  X,  ...  Xq 
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X|  ...  Xn 

PREVENTS    COMPUTATIONS    OF  Y,    FOR    REPEAT    X. 
*'Y')  ' 


AQVAHCE    STRIP 


[31]  L2:IM  +  1         INCREMEMENTS    THROUGH 

-♦[32]  -»L6x\(I>N1) 

[33]  REPEATCK 

-♦[34]  -»L5x\<SKP= 

[35]  STRP<-(A+(Q,  \<B-A)>) 

-♦[36]  -»L3x\0*D<-r/IU<-<X[I]«.-X[STRP])         .   COMPUTES   D| 

[37]  YS[I]<-<+/(LST/Y))+(+/LST<-X=X[I]>     USES    AV6  Y.    IF  Di=  0 

-♦[38]  ->L5  ' 

[39]  L3:WT<-UX[STRP]xTX<-<<1-<  |U*3>  >*3)x((|U«-U-i-D><1  >  TRICUBE    WT  FCN 

-♦[40]  L4:-»R2xi<REG*'L'  )  ~ 

[41]  X[STRP]    REGRES    Y[STRP]  WEIGHTED   REGRESSIONS 

->[42]  -»L5 

[43]  R2X[STRP]    REGRES2   Y[STRP]   _ 

-»[44]  L5:-»L2xi(Bi:N1  )v<I^N1) 

-♦[45]  ->L2xH(DA<-(X[I  +  1]-X[A]))^(DBKX[B+1]-X[I-»-1]))) 

[46]  A«-A  +  1 

[47]  Bf-B+1 

-♦[48]         +L5 

[49]  L6:RQ«-|R[*<  IR«-RESY*-<Y-YS))] 
-♦[50]         -♦L10xi(0i»iM«-O.5x  +  /|(RO[(rN1-r2>,H-LN1^2])) 

[51]        U1H 
-♦[52]         -♦LI  1 

[53]  L10:U1«-R+<6xM) 

[54]  L11  :WX«-((1-(U1*2))*2)x((|U1  )<1  ) 
-♦[55]    -»L7x\<R0B*,Y,> 
-♦[56]    -»Llxi(j£2) 

[57]  L7:PL0TQUERY  RUN  PLOTS 

[58]    YSMTH<-YS 

-♦[59]  «-»L8x\<PT*,Y,> 

-♦[60]  a-*0 

[61]  L8:'THE  OUTPUT  FROM  THIS  LOWESS  SMOOTHING  IS  STORED  UNDER  THE' 

[62]    'FOLLOWING  VARIABLE  NAMES:' 

[63]    '      YSMTH  SMOOTHED  Y  VALUES' 

[64]    '      X1  X  VALUES  ARRANGED  IN  ASCENDING  ORDER' 

[65]    '      Y1  ORIGINAL  Y  VALUES' 

[66]    '      RESY  RESIDUALS' 


BICUBE  WT  FCN 


DATAINPUT  controls  the  data  entry  portion  of  the  proce- 
dure. Data  and  program  operating  parameters  are  entered  in 
response  to  program  queries.  DATAINPOT  accepts  data  that  is 
stored  in  the  active  APL  workspace,  transfers  data  from 
other  APL  workspaces  and  converts  CHS  data  into  APL. 
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«*DATAI 
[0] 
CI] 
[2] 
[3] 
[4] 
[5] 

i[6] 
[7] 
[8] 
[9] 
[10] 

1[11] 
[12] 
CI3] 
CI  43 
[15] 
[16] 
[17] 
[18] 
[19] 

i[20] 
[21] 
[22] 

[23] 
[24] 
[25] 
[26] 
[27] 
[28] 
[29] 
[30] 
[31] 

[32] 
[33] 
[34] 
[35] 
[36] 
[37] 
[38] 

-♦[39] 
[40] 
[41] 
[42] 
[43] 

-♦[44] 
[45] 
[46] 
[47] 
[48] 
[49] 
[50] 

"♦[51] 
[52] 

[53] 

[54] 


HPUT 

DATAINPUTj«S1,QS2,QS3 

PROCEED*-'!' 

•  i 

•IS  YOUR  DATA  SET  LOCATED  IN  THIS  WORKSPACE?1 
1 (YES  OR  NO)' 

«sit-ita 

-♦LP1*l(QS1  =  'N') 

'ENTER  THE  NAME  OF  THE  X  VARIABLE' 

'ENTER  THE  NAHE  OF  THE  Y  VARIABLE1 
YfO 
-♦END 
LP1:'IS  YOUR  DATA  LOCATED:' 

(1)  IN  AN  APL  WORKSPACE  LOCATED  ON  THIS  DISK  OR  ON  A  DISK' 

•  THAT  YOU  ARE  LINKED  TOj ' 

(2)  IN  A  CMS  FILE  ON  THIS  DISK  OR  ON  A  DISK  THAT  YOU  ARE' 
1        LINKED  TQi ' 

•  (3)  NEITHER  < 1 )  OR  (2)  ABOVE.' 
'ENTER  (1,2  OR  3)' 

QS2fO 

-KLP2,LP3,LP4)[QS2] 
LP2:'T0  TRANSFER  YOUR  DATA  TO  THIS  WORKSPACE;* 
'    (1)  TYPE  . ..)pCOPY  (WS  NAME)  (X  VARIABLE  NAME)  <Y 
VARIABLE  NAME)' 

EXAMPLE:   )PCOFY  DATA   X   Y' 
'        IF  YOUR  DATA  IS  STORED  AS  TWO  SEPERATE  VARIABLES' 

•  (2)  TYPE  ...)PCOPY  (WS  NAME)  (VARIABLE  NAME)' 

•  EXAMPLE:   )PCOPY  DATA  ARRAY' 

•  IF  YOUR  DATA  IS  STORED  UNDER  A  SINGLE  VARIABLE  NAME' 

•  AS  IN  A  TWO  DIMENSIONAL  ARRAY' 


DATE  AND  TIME  SAVED  INFORMATION  IS  DISPLAYED' 
WHEN  THE  TRANSFER  IS  COMPLETE.  THEN  ENTER    •♦  CO 


•  TO  CONTINUE  THE  LOUESS  SMOOTHING  PROGRAM' 
SaDATAINPUT«-GO 

GO: 'DO  YOU  NEED  TO  DEFINE  YOUR  X  AND  Y  VARIABLES  ANY  FURTHER?' 

'ANSWER  NO  IF  YOU  ENTERED  SEPARATE  X  AND  Y  VARIABLE  NAMES' 

•IN  THE  PRECEDING  STEP.  OTHERWISE  ANSWER  YES.' 

'(YES  OR  NO)' 

fiS3«-1ta 

-»END*l(QS3='N') 

'DEFINE  THE  X  VARIABLE' 
'  XfO 

•DEFINE  THE  Y  VARIABLE' 

Y«-a 

-♦END 
LP3'TO  TRANSFER  YOUR  CMS  DATA  FILE  TO  THIS  WORKSPACE:' 

(1)  ANSWER  THE  FOLLOWING  QUESTIONS  ABOUT  YOUR  X  DATA  FILE' 
Xt-CHSREAD 

•  <2)  ANSWER  THE  FOLLOWING  QUESTIONS  ABOUT  YOUR  Y  DATA  FILE' 
Y«-CMSR£AD 

'YOU  ARE  NOW  READY  TO  PROCEED  WITH  LOWESS' 

^END 
LP4:'YOUR  DATA  MUST  BE  STORED  IN  AN  APL  WORKSPACE  OR  IN  A  CMS 
FILE' 

'LOCATED  ON  THIS  DISK  OR  ON  A  DISK  TO  WHICH  YOU  ARE  LINKED. 
LOWESS' 

'IS  BEING  TERMINATED.  PLEASE  COMPLY  WITH  CONDITION  ( 1 )  OR  (2) 
i 


[55]    'AND  REINITIATE  LOWESS.' 

[56]    PRQCEEDf'N' 

f57]   END:SaDATAINPUT«-0 
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REPEATCK  reduces  the  number  of  computations  required  to 
smooth  a  data  set  by  assigning  the  same  smoothed  Y  value  to 
data    points   that    have    the    same    X    value. 

•REPEATCK 
CO]         REPEATCK 
[1]         SKP*-'N' 
-    -»C2]         ->ENDx\<I£1) 

-»C3]         -»ENDx\<x[I]*XCI-1]> 

C5]         SKP«-'Y' 
C6]      END: 


ELOTCDERY  controls  the  the  graphical  output  when  oper- 
ating with  the  IBM  GBAFSTSAT  statistical  graphics  package. 
It  calls  the  sub  program  LOWS  to  smooth  the  absolute  value 
of  the  (Yi-Yi)  residuals  obtained  from  smoothing  the  orig- 
inal data. 

**PLOTQUERY 

CO]     PLOTQUERY 

cn   •  • 

C2]  'DO  YOU  WANT  A  PLOT  OF  YOUR  LOWESS  SMOOTHED  CURVE?' 

C3]  '(YES  OR  NO)  ENTER  NO  IF  NOT  USING  GRAFSTAT ' 

C4]  PT<-1ta 

->C5]  -♦ENDxKPT^'Y' ) 

C6]  'INPUT  X  AXIS  LABEL' 

C7]  XAXISH3 

C8]  'INPUT  Y  AXIS  LABEL' 

C9]  YAXISt-Q 

-►[10]  -»PL1  XUROB/'Y'  ) 

[11]  PHDR<-' ROBUST  LOUESS  SMOOTHING;  F  =  '  ,  TF 

C12]  BUN  RPLT 

-►C13]  ->PL2 

[14]  PL  1  :PHDR<-'  NON-ROBUST  LOWESS  SMOOTHING;  F  =«  \  TF 

[15]  RUN  NRPLT 

C16]  PL2:'D0  YOU  WANT  A  PLOT  OF  IRESIDUALSI  VS  X?' 

[17]  ' (YES  OR  NO) ' 

C18]  QS5<-1tQ 

■+[19]  -»ENDx\  (QS5*'Y'  ) 

C20]  'DO    YOU    WANT    THIS    PLOT    SMOOTHED?' 

C21]  ' (YES    OR    NO)  ' 

C22]  QS6«-1tCJ 

->C23]  "♦PL3xi(QS6^'Y' ) 

C24]  X    LOUS( IRESY) 

[25]  RUN    SRESPLT 

-►[26]  -+END 

C27]  PL3:BUN    RESPLT 

C28]  END: 
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A 

IOWS    is   used      to    smooth    the    (Yi-Yi)  residuals    obtained 

from  smoothing  the  original  data  set.  It  operates  exactly 
like  10WESS  except  for  the  data  input  and  graphical  output 
setct ions. 


[91     X  LOUS  Y;N1;Q;UX;J;I;A;B;(3;rrRP;U>D;TX;UT;Z;BR;DA;DB;R;U1;M; 

ROjAR;RHS;YZ 
[11     Y«-YOXl 

[23        x«-xc*xi 

C31  QH9.5+QWN1*-i»X)xF 

C*l  YS«-N1j»e 

C31  UX«-N1f»1 

C6]        j*e 

til  L1:J«-J+1 

[81        i*e 

C?l         AM 
Ciei      bh3 

C1U  L2:I«-I+1 
•»C12]         -H.6XUDN1) 

CI  31    REPEATCK 
+C14]    -H_5x\(SKP-'Y' ) 

C151         STRPMA+O,  t<B-A>>) 

-»ci6]      -»L3x»o?«D<-r/!u*-(xcra».-xcrTRp:) 

CI  71         UTH4X[STRP]xTX«-Qj>1 

•C181      Yscix+/(Lsr/Y))  +<+/LST«-x-xcrMi 

-»C191    ->L5 

[20]  L3:UT«-WXCrTRPlxTX«-((1-(|u#3))»3)x((|UHJ+D)<1) 

■♦C21 1  L4 : -»R2x  \  <  REG* '  L '  > 

C221    X[STRP]  REGRET  Y[£TRP] 
+C23]    -H.5 

[24]  R2:X[STRP]  REGREI2  Y[STRP] 

-»C23]  L3:-»L2x\(B^N1  )v(l^N1  ) 

+C26]         -»L2x\  (  (DAMX[I  +  1  ]-XCA]>  K(DBMX[BM  ]-X[IM 1) ) ) 

C271         A*-A+1 

C28]         B«-B-M 
-►[29]         -»L5 

C39]  L6:R0<-|RC*<  |R«-<Y-YS)>] 
->C31]         -M_1Qx\<0*M*0.5x+/|  <R0C<rN1r-2),  1+LN1-r2l>> 

[32]         U1<-1 
-»C33]        ->Lti 

C34]  L1Q:(J1«-R+(6xM> 

[33]  L11  :UX«-<<1-<U1»2))»2)x((|U1  )<1  ) 
-»C36]         ->L12x\  (ROB^'Y1) 
->C37]         -»L1x\<j^2) 

C38]  L12: 
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REGRES  computes  linear  least  squares  regressions  of  Y  on 
X  while  REGRES2  computes  quadratic  least  squares  regressions 
of  Y  on  X. 


*REGRES 

t? 

n 


4. 


5 

6 
C7 


XR  REGRES  YR  ;  DEN  ;  W1  j  EM  i  P2 

DEN«-(  (+/U1  )  x  <+/W1  xXR*2)>-<<+/XRxWH-WT*0.5>*2> 

->L1  x\((  |DEN>>0.0001  )  ' 

YS[I]M+/YR>*pYR 

t_t%2<-<<  (+/W1  )x<+/<U1xXRxYR)))-<<+/U1  xXR)x(+/U1  *YR))  )+DEN 
B1«-((-»VW1  xYR)-B2x(+/W1  xXR)  )~( »/U1  ) 
Y5CI]<-B1+B2xXCI.l 


*REGRES: 
3„ 

•M 

:f 

8 


X2    REGRES2    Y2 

A1«-(+/X2x(Ut*e.5>) 

A2M+/<X2*2)x<WT*0.5>) 

A3*  ■(  ■••/<X2*3>x(WT*0.5>> 
AR2<-    3    3    p(+/MT*0.5)  ,A1  .A2.A1  ,A2,  A3, 
RH.S2*  <  +/Y2xUT*0.5>  ,  (+/X2xY2xWT*0.5) 
RHS2*-    3    1     pRHS2, <+/<X2*2)xY2xUT*0.5) 

YSci]«-BRClTl]  +  <BRC2,l3xXCI])  +  <BRC3}nxX[I]H2) 


A2,A3,  <+/<X2M)x<WT*0.5>> 


The  following  character  strings  are  the  screen  vectors 
used  by  the  RON  function  of  GRAFSTAT  to  produce  the  plots  of 
the  1CHESS  smoothe  curves  of  the  original  data  and  absolute 
value   of   the  residuals. 

**NRPLT  73     CHARACTER 

M9X1  9Y1jYS90  lfrt&.*-MVAO«!ta'  '9FHDR9XAXIS9YAXIS9219LIN9LIN91  1  190  1 
0  6 


**RESPLT  BO     CHARACTER 

«M9X9<  |RESY>90919.*+x*ao«+t9'  '9'  '9XAXIS9'  IRESIDUALSJ  •  9229LIN9LIN91  1 
190  1  0  09 


kkRPLT  73     CHARACTER 

M9X19Y1iYS90  i919.»+K9AOi|f9"9PHDR9XAXIS9YAXI£9219LIN9LIN91  1  190  1 
0  6 


**SRESPLT  85     CHARACTER 

n«1&X9<|RESY)jYS90 

1919.*  +  *^AO«ft9'  '9"9XAXIS9'  |RESIDUALS|  • 9229LIN9LIN91  1  1  90  1  0 
09 


73 


APPENDIX  B 
FORTRAN  PROGRAMS 

This  appendix  contains  a  listing  of  the  FORTRAN  program 
and  subroutine  written  to  support  this  thesis.  1V.SL 
programs,  LL3QF  and  PXSORT,  used  to  support  the  LCWESS 
program  are  not  listed.  Detailed  user  instructions  for  oper- 
ating these  programs  are  contained  in  Chapter  V. 


'It  « 

INTEGER 

AX,BX.Al,Q,I1tI2,l3,I4,l5,I6,l7,I8,l9,I10,N,IWK(2)  ,IER,ROE 
C,II1,IF2  C 
DATA  AX/1/, ROB/-1/, N/0/  C 
F=.33 
IF1=2 
IF2=4 
N=0 

1  N  =  N+1 

READ (IF1,9  01,ENE=2) X(N)  ,Y(N) 
GO  TO  1 

2  N=N-1 

CALL    XYSOfiTjXfT/I.N) 
Q=IFIX( (FLOAT  (N)*F) +. 5) 

4  CONTINUE 
AX=1 

jh={ax-1) 

B  X  =  0 

DO    65    11=1, N 
12  =  0 
D=0.0 

DO    10    I3=AX,EX 
12=12*1 

U(I2)=X  (I1)-X(I3) 

IF  (.NOT. ABS  (U (12))  -GE.D)  GO    TO    5 
D  =  ABS  (0(12)) 

5  CONTINUE 
10           CONTINUE 

IF  (.NOT. D.GT. 0.00001) GO  TO  30 
DO  25  14  =  1,0 

U1  =  ABSJU(I4) /D) 
IF(. NCT.U1.LT. 1.0) GO  TO  15 
TX  (14)  =  (1.  0-  (U1**3)  )  **3 
WT  (I4)=TX(I4)*WX  (A  1+14) 
GO  TO  20 
15  CONTINCE 

TX  (I4)=0.0 
WT  (14)  =0.0 
20  CONTINUE 

25        CONTINUE 
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30 


35 
40 


45 
50 


55 
60 

65 


70 


71 


75 

80 
85 

991 


90 

900 
901 


GO 

CONTI 
DO 


CO 
CONTI 


40 


TX(I5) 
WT  (15) 

TINUE 


A 
A 
A 
A 
B 
B 
DO 


1,1 

1/2 
2,1 

2,2 

1,1 

2t  1 
45 

17 

W= 

M 

Ai 

A 

B 

Bs 

CCNTI 
A  (2,1 
CALL 
YS  (11 
CO 


TO 
NDE 
35  15 
15 
5 
NTINUE 
NUE  C 
=  0.0 
=  0.0 
=  0.0 
=  0.0 
=  0.0 
=  0.0 
16  =  1. 
=A1+I6 
SQET(W 


=WX  (A1  +  I5) 


{. 


IP 
IF 


(BX 

(H 

DA 
DB 
IF 


CO 

CONTI 

A1=(A 

CCNTINUE 

DO  70  18 

R]l8 

ET  (I 

CCNTINUE 

CALL  PXS 

L1=  (N+1) 

L2=(N  +  2J 

MED=  (E1  f 

DO  85  19 

IF  (  (E1 

WX 

GO 

EO=E  ( 

IF  (.N 

WX 

GO 

CCNTI 

WX 

CCNTI 

CONTINUE 

WEITE  (6, 

FCFMATjl 

FCE=EOB+ 

DO  9  0  11 

WEITE 

CCNTINUE 

STCP 

FORMAT 

FCFMAT 

ZNE  C 

SUBEOUTI 


1 

2 

2 

1 

1 

NUE 

)=A(1, 

LBQF( 

=  BETA 

TINUE 

.GE.N) 

.GE.  N) 

=  X  (11  + 

=  X  (BX  + 

(.NOT. 

AX=AX 

BX=BX 

GO  TO 

NTINUE 

NUE 

x-D 

=  1,N 

=  Y(I8) 
) =ABS( 

C 
OET(R1 
/2 
/2 

L1)  +  E1 
=  1-N 
(19)  .G 
(195  =1 
TO  80 
l9)/(6 
OT.ABS 
(I9)=( 
TO  80 
NUE 
(19)  =0 
NUE  C 
C  TES 


GO 
GC 

1: 

+  1 
+  1 

50 


TO  6 
TO  6 
X  (AX 
X(I1 

GT.D 


X  (17)  *W) 

W*fX  (17)  **2)  ) 

Y(I7)*Wj 

Y  (17)  *X(I7)  *W) 

,B,2. 1.0,C,3ETA,2,IWK,WK,IER)  C 
ETA  (2,1)  *X  (11) 

0 
0 

B) GO  TO  55 


-YS(I8) 
H(I8)) 

,1,N)  C 


(!2))/2.0 

I. 0.0)  . AND.  (ABS(MED)  .GT.0.0) )  GO  TO  71 
.0 


C*MED) 
fFU)  .IT.  1.  0) 
1.0-(EU**2)) **2 


;EU)  -IT.  1.  0)  GO  TO  75 


991 


.0 

T 


91WWXjL)L=1,  N) 
X,  10F7.3)  C  END  TEST  C 
IF(.NOT.FOB.GE. 


1"  C 

0=1. N 
(IF2,9 


(.NOT.  FOB. GE. 2)  GO  TO  4 
OO)X(HO)  ,Y(I10)  ,YS(I10) 


B 


X,3F15 
F15.3) 


•3) 

NE  XYSCFT (A,B,II, JJ)  C 
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DIMENSION    A(JJ),B(JJ) 
M=  1 
1  =  11 
J=JJ 
5    IF  (I    .GE.     J)  GO    10    70 
10    K=I 


ID  (16)  ,IL(16) 


IJ=  (I  +  J)  /2 


IF 
A 


19) 


T1=B  I 

B    IJ)=B(I) 
A  (I)=T 
B  (I)=T1 


mh 


LE.     T)    GO    TO    20 


T=A{IJ) 
T1=B  (IJ) 
20    L  =  J 


IF(A(J)     .GE, 
E  (IJ)=B(JJ 


*J)=T 


T)    GO    TO    40 


B    J 


\=l 


1 


T=A  (IJ) 
T1=E  (IJ) 
IF(A(I)     -IE. 
A  (IJ)=A(I) 
B  (IJ)  =B(I) 
A    I)=T 
B  (I)  =T1 
T=A(IJ) 
T1=B(IJ) 
GO    TO    40 
30    TT  =  A(L) 
TT1  =  B  (L) 


T)    GO    TO    40 


-sub 


A  (I)=, 

b  'i)=: 

A(K)  =TT 

B  (K)=TT1 
40    L=I-1 

IF  (A  (I)     .GT. 
50    K  =  K+1 


T)    GO    TO    40 


IF(A(K)     .IT. 
.LE. 


Tl 


GO    TO    50 

IF  (K  ".LE.    L)     GO   TO    30 

IF  (L-I    .LE.    J-K)     GO    TO    60 

IL  (M)  =1 

ID    M)=L 

I  =  K 

M=M+1 

GO  TO  80 
60    IL  (M)=K 

ID  (M)  =J 

J=I 

M  =  M+1 

GO  TO  80 
70    M=M-1 

IF(M    .EQ. 

I=IL  (M) 

J=ID(M) 

IF(J-I    .GE.     11)  GO    TO    10 

IFjI    .EQ.     II)     GC   TO    5 

90    1=1+1 

IF  (I    .EQ.     J) 
IF    A  (I)     -LE. 
T    =    AJI+1) 
T1=B  (1+1) 
K=I 
100    A  (K+1)=A (K) 
B(K+1)=B  (K) 


0)     EFTOBN 


80 


GC    TO    70 

A  (1+1) )     GO    TO    90 
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K=  K—  1 

IF  (T  .LT.  A (K)  )  GO  TO  100 

A  (K+1)=T 

B  K+1)=T1 

GO  TO  90 

END  SENTRY 

The  following  LCKS  EXEC  routine  sets  the  file  defini- 
tions and  invokes  the  appropriate  systems  libraries  required 
to  execute  LOWESS.  This  routine  is  executed  by  typing  "LOWS 
EXEC." 


GICBAI  MACLIB  If.SLSP 

NCNIMSL 

FILEDEF 

02 

DISK  LOW2 

DATA 

A  i 

[PERM 

FI1EDEF 

03 

DISK  LOW3 

DATA 

A  I 

PERM 

FILEDEF 

OH 

DISK  LOW4 

DATA 

A 

PERM 

FILEDEF 

07 

DISK  LOW7 

DATA 

A  | 

PERM 

FILEDEF 

08 

DISK  LOW8 

DATA 

A  i 

PERM 
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APPENDIX  C 
DATA  SETS 

This  appendix  contains  four  data   sets  that  were  used  to 
compare  LOWESS  with  MOVING  AVERAGE,    COSINE  ARCH   and  IEAST 
SQUARES  REGRESSION  rooutines  in  Chapter  III.  They  include: 
1.'   TEST  SET   ONE  ...    used  to   test  LOWESS'   ability  to 
detect  and  follcw  linear  trends. 

2.  TEST  SET  TWO  ...  used  to  check  LOWESS'  performance  on 
data  sets  that  contain  abrupt  changes  in  curvature. 

3.  TEST  SET  THREE   ...   used  to  test   LOWESS'  ability  to 
fellow  smooth  changes  in  curvature. 

4.  Lag-1  points   from  NEAR  (1)   data   ...   used   to  check 
LCWESS'  performance  on  unequally  spaced  data. 
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TABLE 

IX 

Data  Set  One 

X 

Y 

X 

Y 

X 

Y 

.200 

".398 

10.200 

8.696 

20.200 

21 .520 

.400 

-.811 

10.400 

10.305 

20.400 

19.996 

.600 

-.103 

10.600 

10.997 

20.600 

21 .018 

.800 

1  .156 

10.800 

10.273 

20.800 

21  .047 

1.000 

1  .653 

1 1 .000 

11 .345 

21 .000 

21 .704 

1  .200 

1  .416 

11 .200 

10.477 

21 .200 

21 .832 

1  .400 

1  .136 

11 .400 

12.668 

21 .400 

20.408 

1  .400 

3.402 

11 .600 

1 1 .569 

21 .600 

23.367 

1  .800 

1  .157 

11 .800 

12.578 

21 .800 

21  .418 

2.000 

2.110 

12.000 

14.180 

22.000 

21 .089 

2.200 

1.481 

12.200 

12.638 

22.200 

21 .204 

2.400 

2.821 

12.400 

13.733 

22.400 

23.595 

2.600 

.669 

12.600 

12.851 

22.600 

22.441 

2.800 

*   3.460 

12.800 

12.490 

22.800 

25.504 

3.000 

1.897 

13.000 

12.077 

23.000 

22.802 

3.200 

3.097 

13.200 

12.815 

23.200 

23.059 

3.400 

2.340 

13.400 

14.558 

23.400 

23.811 

3.600 

2.361 

13.600 

14.463 

23.600 

22.421 

3.800 

1  .911 

13.800 

12.765 

23.800 

23.522 

4.000 

3.026 

14.000 

13.807 

24.000 

22.419 

4.200 

4.412 

14.200 

12.900 

24.200 

25.249 

4.400 

4.893 

14.400 

14.707 

24.400 

24.703 

4.600 

6.147 

14.600 

15.569 

24.600 

23.373 

4.800 

5.445 

14.800 

14.053 

24.800 

24.870 

5.000 

2.852 

15.000 

12.204 

25.000 

24.603 

5.200 

4.171 

15.200 

15.897 

25.200 

26.589 

5.400 

5.258 

15.400 

18.607 

25.400 

26.764 

5.600 

3.073 

15.600 

16.136 

25.600 

26.258 

5.800 

5.487 

15.800 

16.098 

25.800 

26.291 

6.000 

5.406 

16.000 

16.284 

26.000 

26.801 

6.200 

6.532 

16.200 

17.160 

26.200 

25.433 

6.400 

6.959 

16.400 

18.488 

26.400 

26.764 

6.600 

7.500 

16.600 

18.125 

26.600 

26.202 

6.800 

6.599 

16.800 

16.605 

26.800 

27.664 

7.000 

6.766 

17.000 

17.017 

27.000 

26.822 

7.200 

8.650 

17.200 

17.446 

27.200 

29.074 

7.400 

9.236 

17.400 

16.546 

27.400 

27.572 

7.600 

7.217 

17.600 

18.758 

27.600 

28.872 

7.800 

7.955 

17.800 

17.962 

27.800 

27.765 

8.000 

7.035 

18.000 

19.557 

28.000 

26.499 

8.200 

8.239 

18.200 

18.006 

28.200 

28.565 

8.400 

9.165 

18.400 

20.051 

28.400 

28.201 

8.600 

8.005 

18.600 

16.701 

28.600 

27.210 

8.800 

8.930 

18.800 

20.623 

28.800 

29.029 

9.000 

9.035 

19.000 

17.482 

29.000 

29.271 

9.200 

8.575 

19.200 

18.149 

29.200 

28.834 

9.400 

8.860 

19.400 

19.450 

29.400 

30.777 

9.600 

1 1 .480 

19.600 

18.145 

29.600 

28.802 

9.800 

8.796 

19.800 

20.267 

29.800 

28.863 

10.000 

9.503 

20.000 

20.545 

30.000 

29.998 
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TABLE 

I 

Data  Set  Two 

X 

Y 

X 

Y 

X 

Y 

X 

Y 

.200 

-.462 

11  .200 

3.849 

22.200 

4.819 

33.200 

1  .657 

.400 

"2.191 

11 .400 

4.554 

22.400 

4.469 

33.400 

2.245 

.600 

1.405 

11 .600 

3.182 

22.600 

4.997 

33.600 

.862 

.800 

.947 

11 .800 

3.159 

22.800 

6.256 

33.800 

3.226 

1  .000 

.475 

12.000 

4.518 

23.000 

6.278 

34.000 

1  .362 

1.200 

.832 

12.200 

5.736 

23.200 

6.490 

34.200 

2.923 

1  .400 

".137 

12.400 

4.989 

23.400 

5.499 

34.400 

2.736 

1.600 

2.336 

12.600 

3.752 

23.600 

5.860 

34.600 

1  .736 

1  .800 

.779 

12.800 

5.165 

23.800 

4.325 

34.800 

2.129 

2.000 

2.597 

13.000 

4.052 

24.000 

4.949 

35.000 

1  .433 

2.200 

1.144 

13.200 

3.594 

24.200 

6.690 

35.200 

1  .313 

2.400 

1  .832 

13.400 

3.895 

24.400 

6.339 

35.400 

2.756 

2.600 

".406 

13.600 

3.747 

24.600 

5.899 

35.600 

1  .576 

2.800 

.419 

13.800 

4.171 

24.800 

4.233 

35.800 

.363 

3.000 

2.446 

14.000 

4.962 

25.000 

5.825 

36.000 

2.955 

3.200 

.641 

14.200 

3.356 

25.200 

5.742 

36.200 

.266 

3.400 

1  .937 

14.400 

4.792 

25.400 

4.873 

36.400 

1  .664 

3.600 

1  .080 

14.600 

5.593 

25.600 

5.497 

36.600 

.323 

3.800 

1  .384 

14.800 

4.630 

25.800 

7.697 

36.800 

.783 

4.000 

.251 

15.000 

5.203 

26.000 

4.600 

37.000 

1  .419 

4.200 

.410 

15.200 

4.468 

26.200 

3.374 

37.200 

1  .997 

4.400 

2.745 

15.400 

6.558 

26.400 

2.242 

37.400 

.533 

4.600 

1  .795 

15.600 

5.484 

26.600 

4.078 

37.600 

1  .137 

4.800 

1  .121 

15.800 

2.766 

26.800 

4.090 

37.800 

.506 

5.000 

1  .235 

16.000 

4.635 

27.000 

3.519 

38.000 

.671 

5.200 

2.942 

16.200 

2.812 

27.200 

6.651 

38.200 

".612 

5.400 

2.104 

16.400 

5.668 

27.400 

5.513 

38.400 

.376 

5.600 

2.753 

16.600 

5.055 

27.600 

5.141 

38.600 

1 .921  • 

5.800 

2.717 

16.800 

5.319 

27.800 

4.818 

38.800 

".476 

6.000 

3.156 

17.000 

5.574 

28.000 

1  .451 

39.000 

"1 .014 

6.200 

2.880 

17.200 

6.472 

28.200 

5.936 

39.200 

1  .788 

6.400 

1  .219 

17.400 

4.420 

28.400 

4.205 

39.400 

1  .306 

6.600 

3.015 

17.600 

4.623 

28.600 

3.202 

39.600 

.853 

6.800 

3.845 

17.800 

5.396 

28.800 

1  .977 

39.800 

"1 .468 

7.000 

3.529 

18.000 

5.778 

29.000 

4.046 

40.000 

1  .554 

7.200 

.503 

18.200 

3.705 

29.200 

5.971 

40.200 

-.542 

7.400 

2.686 

18.400 

4.290 

29.400 

4.175 

40.400 

"2.351 

7.600 

2.717 

18.600 

4.900 

29.600 

4.583 

40.600 

1  .165 

7.800 

3.438 

18.800 

2.397 

29.800 

3.479 

40.800 

.627 

8.000 

2.689 

19.000 

6.059 

30.000 

4.621 

41 .000 

.075 

8.200 

3.278 

19.200 

3.894 

30.200 

1  .989 

41 .200 

.352 

8.400 

4.967 

19.400 

6.093 

30.400 

4.408 

41 .400 

".697 

8.600 

4.288 

19.600 

4.174 

30.600 

3.896 

41 .600 

1  .696 

8.800 

3.788 

19.800 

5.615 

30.800 

3.1  12 

41 .800 

.059 

9.000 

2.677 

20.000 

5.820 

31 .000 

3.422 

42.000 

1  .797 

9.200 

3.610 

20.200 

4.844 

31 .200 

4.740 

42.200 

.264 

9.400 

3.908 

20.400 

5.602 

31  .400 

3.108 

42.400 

.872 

9.600 

3.283 

20.600 

4.933 

31 .600 

3.892 

42.600 

"1  .446 

9.800 

3.583 

20.800 

5.634 

31  .800. 

1  .630 

42.800 

".701 

10.000 

4.415 

21 .000 

4.003 

32.000 

4.039 

43.000 

1  .246 

10.200 

5.578 

21 .200 

4.389 

32.200 

4.600 

43.200 

-.639 

10.400 

1  .596 

21 .400 

6.545 

32.400 

2.125 

43.400 

.577 

10.600 

2.962 

21 .600 

4.540 

32.600 

1  .625 

43.600 

".360 

10.800 

5.203 

21 .800 

5.417 

32.800 

1  .602 

43.800 

".136 

11 .000 

4.682 

22.000 

3.613 

33.000 

3.  180 

44.000 

"1 .349 
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TABLE 

XI 

Data    Set 

Three 

X 

Y 

X 

Y 

X 

Y 

.063 

.261 

2.135 

.560 

4.208 

"1 .733 

.126 

-.129 

2.198 

.716 

4.270 

".860 

.188 

.053 

2.261 

1  .376 

4.333 

.049 

.251 

-.293 

2.324 

.410 

4.396 

-.870 

.314 

1  .316 

2.386 

.988 

4.459 

"1 .282 

.377 

1  .340 

2.449 

.326 

4.522 

~1 .701 

.440 

-.335 

2.512 

.875 

4.584 

"1 .025 

.502 

1  .451 

2.575 

.175 

4.647 

-.81  1 

.565 

.088 

2.638 

1  .079 

4.710 

-.891 

.628 

.435 

2.700 

.520 

4.773 

"1 .088 

.691 

.915 

2.763 

1  .167 

4.836 

".980 

.754 

.522 

2.826 

.471 

4.898 

-.662 

.816 

1  .398 

2.889 

.684 

4.961 

-.508 

.87? 

1  .381 

2.952 

.835 

5.024 

"1 .729 

.942 

.011 

3.014 

.344 

5.087 

-.599 

1.005 

.310 

3.077 

-.129 

5.150 

"1 .21 1 

1.068 

.496 

3.140 

-.055 

5.212 

-.595 

1  .130 

1  .115 

3.203 

-.543 

5.275 

"1 .151 

1  .193 

.713 

3.266 

"1 .152 

5.338 

-.195 

1.256 

1  .304 

3.328 

-.111 

5.401 

-.275 

1.319 

1.082 

3.391 

.024 

5.464 

-1 .133 

1  .382 

.474 

3.454 

-.180 

5.526 

-.982 

1  .444 

1  .062 

3.517 

-.520 

5.589 

.206 

1.507 

.624 

3.580 

-.633 

5.652 

-.113 

1.570 

.686 

3.642 

.088 

5.715 

"1 .503 

1  .633 

1  .695 

3.705 

-.339 

5.778 

-.228 

1  .696 

.168 

3.768 

.216 

5.840 

•     ".232 

1  .758 

-.025 

3.831 

fc     •_   *_   --J 

5.903 

-.824 

1  .821 

1.215 

3.894 

.052 

5.966 

-.949 

1  .884 

.174 

3.956 

"1  .417 

6.029 

-.078 

1.947 

.860 

4.019 

-.899 

6.092 

-.788 

2.010 

1  .028 

4.082 

-.310 

6.154 

.205 

2.072 

.743 

4.145 

.074 

6.217 

".100 
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TABLE    III 
Lag-1    Data    derived    from   NEAR(1)     Process 


x 

1  .020 

.035 

.129 

.125 

.153 

.233 

2.077 

2.155 

1  .821 

.042 

.036 

.061 

.149 

4.260 

4.095 

3.422 

2.854 

2.609 

2.176 

1.823 

1.617 

2.439 

2.047 

1  .840 

3.049 

2.682 

2.239 

1  .889 

1.577 

1  .664 

.103 

.133 

.145 

.207 

.221 

.196 

.170 

.185 

.087 

2.258 

.938 

.617 

.346 

.184 

.007 

.853 

.779 

.727 

.822 


Y 

.466 

1  .020 

.035 

.129 

.125 

.153 

.233 

2.077 

2.155 

1  .821 

.042 

.036 

.061 

.149 

4.260 

4.095 

3.422 

2.854 

2.609 

2.176 

1.823 

1  .617 

2.439 

2.047 

1  .840 

3.049 

2.682 

2.239 

1  .889 

1  .577 

1  .664 

.103 

.133 

.145 

.207 

.221 

.196 

.170 

.185 

.087 

2.258 

1  .938 

1  .617 

1  .346 

1  .184 

1  .007 

.853 

.779 

.727 


X 

.871 

.747 

1.385 

1.189 

.017 

.261 

.366 

.349 

.364 

1  .140 

1  .020 

3.508 

3.122 

2.623 

2.654 

.209 

.255 

.271 

1.185 

.989 

2.867 

2.488 

2.086 

1  .756 

1  .530 

1  .456 

.180 

.429 

.031 

2.951 

2.565 

2.133 

3.737 

3.180 

2.675 

2.307 

1  .996 

1  .892 

1  .700 

1  .716 

1  .599 

1  .498 

1  .247 

.044 

.306 

.255 

.258 

.519 

.650 


Y 

.822 

.871 

.747 

1  .385 

1  .189 

.017 

.261 

.366 

.349 

.364 

1  .140 

1  .020 


3.508 

3.122 

2.623 

2.654 

.209 

.255 

.271 

1  .185 

.989 

2.867 

2.488 

2.086 

1  .756 

1  .530 

1  .456 

.180 

.429 

.031 

2.951 

2.565 

2.133 

737 

180 

675 

307 

996 

892 

700 

716 

599 

498 

247 

.044 

.306 

.255 

.258 

.519 


X 
.563 
.049 
.133 
.334 
.596 
.604 
.527 
.934 
.797 
.496 
.420 
.522 
.353 
.187 
1  .050 
.898 
.854 
1  .631 
1  .363 
.172 
.303 
.229 
.061 
.962 
.907 
.856 
.135 
.953 
.728 
.010 
.073 
.082 
.096 
.098 
.234 
.046 
.017 
.239 
.105 
.124 
1  22 
.122 
.154 
.  165 
.205 
.f90 
.315 
.335 
.304 


Y 
.650 
.563 
.049 
.133 
.334 
.596 
.604 
.527 
.934 
.797 
.496 
.420 
.522 
1  .353 
1  .187 
1  .050 
.898 
.854 
631 
363 
172 
303 


061 
.962 
.907 
.856 
1  .  135 
.953 
1  .728 
.010 
.073 
.082 
.096 
.098 
.234 
1  .046 
1  .017 
1  .239 
.105 
.124 
.122 
.122 
.154 
.165 
.205 
.190 
.315 
.335 


X 

Y 

.313 

.304 

.376 

.313 

.329 

.376 

.363 

.329 

.556 

.363 

.655 

.556 

.544 

.655 

.569 

.544 

.531 

.569 

.518 

.531 

.584 

.518 

4.292 

.584 

3.610 

4.292 

4.074 

3.610 

3.492 

4.074 

3.644 

3.492 

3.147 

3.644 

.022 

3.147 

.330 

.022 

.310 

.330 

.597 

.310 

.551 

.597 

.544 

.551 

.817 

.544 

.808 

.817 

.715 

.808 

.601 

.715 

.618 

.601 

1  .525 

.618 

1  .526 

1  .525 

1  .279 

1  .526 

1  .065 

1  .279 

.929 

1  .065 

.814 

.929 

.703 

.814 

.704 

.703 

.898 

.704 

.785 

.898 

1  .065 

.785 

.995 

1  .065 

3.157 

.995 

2.710 

3.157 

2.265 

2.710 

1  .883 

2.265 

1  .566 

1  .883 

1  .488 

1  .566 

1  .268 

1  .488 

1  .206 

1  .268 

2.825 

1  .206 
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